gambits: theory and evidence

21
Gambits: Theory and Evidence Shiva Maharaj * ChessEd Nick Polson Booth School of Business University of Chicago Christian Turk Phillips Academy Andover, MA October 12, 2021 Chess is not a game. Chess is a well-defined form of computation. You may not be able to work out the answers, but in theory, there must be a solution, a right procedure in any position—-John von Neumann Abstract Gambits are central to human decision making. Our goal is to provide a theory of Gambits. A Gambit is a combination of psychological and technical factors designed to disrupt predictable play. Chess provides an environment to study gambits and behavioral economics. Our theory is based on the Bellman optimality path for sequential decision making. This allows us to calculate the Q-values of a Gambit where material (usually a pawn) is sacrificed for dynamic play. On the empirical side, we study the effectiveness of a number of popular chess Gambits. This is a natural setting as chess Gambits require a sequential assessment of a set of moves (a.k.a. policy) after the Gambit has been accepted. Our analysis uses Stockfish 14 to calculate the optimal Bellman Q values. To test whether Bellman’s equation holds in play, we estimate the transition probabilities to the next board state via a database of expert human play. This then allows us to test whether the Gambiteer is following the optimal path in his decision making. Our methodology is applied to the popular Stafford and reverse Stafford (a.k.a. Boden-Kieretsky-Morphy) Gambit and other common ones including the Smith-Morra, Goring, Danish and Halloween Gambits. We conclude with directions for future research. Key Words: AI, AlphaZero, Adversarial Risk Analysis, Behavioral Economics, Behavioral Game Theory, Behavioral Science, Chess Gambits, Decision Making, Deep Learning, Neural Network, Q Learning, Stockfish 14, Stafford Gambit, Rationality, Skewness Preference * Email: [email protected] Email: [email protected] Email: [email protected] 1 arXiv:2110.02755v2 [econ.TH] 11 Oct 2021

Upload: others

Post on 22-Nov-2021

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gambits: Theory and Evidence

Gambits: Theory and Evidence

Shiva Maharaj*ChessEd

Nick Polson†

Booth School of BusinessUniversity of Chicago

Christian Turk‡

Phillips AcademyAndover, MA

October 12, 2021

Chess is not a game. Chess is a well-defined form of computation. You may not be able to work out theanswers, but in theory, there must be a solution, a right procedure in any position—-John von Neumann

Abstract

Gambits are central to human decision making. Our goal is to provide a theory of Gambits. AGambit is a combination of psychological and technical factors designed to disrupt predictableplay. Chess provides an environment to study gambits and behavioral economics. Our theory isbased on the Bellman optimality path for sequential decision making. This allows us to calculatethe Q-values of a Gambit where material (usually a pawn) is sacrificed for dynamic play. On theempirical side, we study the effectiveness of a number of popular chess Gambits. This is a naturalsetting as chess Gambits require a sequential assessment of a set of moves (a.k.a. policy) afterthe Gambit has been accepted. Our analysis uses Stockfish 14 to calculate the optimal Bellman Qvalues. To test whether Bellman’s equation holds in play, we estimate the transition probabilitiesto the next board state via a database of expert human play. This then allows us to test whetherthe Gambiteer is following the optimal path in his decision making. Our methodology is appliedto the popular Stafford and reverse Stafford (a.k.a. Boden-Kieretsky-Morphy) Gambit and othercommon ones including the Smith-Morra, Goring, Danish and Halloween Gambits. We concludewith directions for future research.

Key Words: AI, AlphaZero, Adversarial Risk Analysis, Behavioral Economics, Behavioral GameTheory, Behavioral Science, Chess Gambits, Decision Making, Deep Learning, Neural Network,Q Learning, Stockfish 14, Stafford Gambit, Rationality, Skewness Preference

*Email: [email protected]†Email: [email protected]‡Email: [email protected]

1

arX

iv:2

110.

0275

5v2

[ec

on.T

H]

11

Oct

202

1

Page 2: Gambits: Theory and Evidence

1 Introduction

What is the object of playing a Gambit opening? To acquire a reputation of being a dashing player at thecost of losing a game—-Siegbert Tarrasch

Gambits are central to human decision making. Our goal is to provide a theory of Gambits.Rational decision making under uncertainty has a long history dating back to Ramsey (1926) and deFinetti’s (1931) original discussion of previsions and the avoidance of sure loss (a.k.a. dutch book).This principle is central to the theory of maximum (subjective) expected utility which is prevalentin economic theory, see von Neumann and Morgenstern (1944) and Savage (1954) and Edwards,Lindman and Savage (1963). Behavioral decision making seeks to catalogue all the ”anomalies” inhuman decision making. The paper will focus on Gambits in chess, when a player offers a materialadvantage of a pawn (sub-optimal play) for dynamic play in the continuation of the game. Ourpaper has four goals. First, we will utilize Bellman’s principle of optimality andQ-learning togetherwith the latest chess engine, Stockfish 14, to determine, in a concrete way, the rationality of Gambits.Second, we will propose a method of ranking Gambits based on a player’s preference for skew andrisk. Third, we advocate a new environment, chess, as a way to study behavioral economics. Fourth,directions for future research.

Our work is closely related to the theory of conflict and negotiation (Schelling, 1960, 2006),game theory (von Neumann and Morgenstern, 1944), and Coase’s theorem on bargaining. Schelling’stheory applies to a wide range of applications such as defender-attacker relationships to surpriseattacks. Even the threat of a Gambit can have a psychological effect on the other player. Schellingviews the strategy of conflict as a bargaining problem and provides numerous conflict situations inwhich an un-intuitive and often self-limiting move helps you win.

Our theory of Gambits has that feature. The Gambit itself is giving up a material advantage, aself-limiting policy, enabling speed and initiative in future play. A plan gets executed quicker, beingworth more positionally than materially the Gambit of minor material (a.k.a. pawns).

From a game theoretical perspective, it is well-known that in variable-sum games the actionsof others can affect your optimal decisions. Our approach can be viewed as an application of Ad-versarial Risk Analysis (Rios, Ruggeri, Soyer and Wilson, 2020) in which we model the distribu-tional probabilities of the next state of play by the defender of the Gambit given our Gambit via adatabase of human play. Here it is important to understand our data is gathered from master gamesand therefore is not totally representative of the average chess player. In fact, the prevalence andsuccess of Gambits is likely to decrease as player skill increases. Likewise, with waning opponentplayer skill, the ability to defend against aggressive play diminishes and the odds of success for theGambiteer increases.

A Gambit is a combination of psychological and technical factors designed to disrupt pre-dictable play. Once the Gambit is employed—and accepted—the side that accepts the Gambit mustplay with extreme accuracy to avoid the numerous pitfalls that follow but enjoys a certain win withoptimal play. A useful running technical definition of a Gambit policy is then:

A Gambit policy has sure loss in at least one state of the world. But optimal play thereafter has advan-tages elsewhere, if optimal play is continued under the Bellman principle of optimality.

In terms of Q-values, which measure the gains to optimal play underlining the principle ofoptimality (see Section 2.1), we have a formal definition

A Gambit policy G, with policy aG(s) in state s, is one in which Q(s, aG(s)) < 0 value in the statewhere the opponent subsequently plays in an optimal fashion, is negative, but is positive in all other states

2

Page 3: Gambits: Theory and Evidence

when the opponents play is sub-optimal giving the Gambiteer an advantage if he/she plays optimally there-after.

What has not been studied is the propensity for humans to take on Gambits. We will offer anumber of economic and behavioral explanations and detailed calculations using principles fromsequential decision making. Our goal is to study Gambits from both a theoretical and empiricalperspective. In the case of the Gambiteer, they are willing to sacrifice a pawn as defending theGambit is equally, if not harder, than risking the Gambit. This is nicely summarised by the followingquote of Karpov–the 12th World champion:

It doesn’t require much for misfortune to strike in the King’s Gambit, one incautious move, and Blackcan be on the edge of the abyss. —Anatoly Karpov

An accepted Gambit transfers over what is called in Chess, ”initiative” to the Gambiteer. Whena player has ”initiative”, he or she is in control of the style and pace of the game, like a football teamwhen in possession of the ball. Although Gambits are now easily refuted by Stockfish 14, they stillperform well in human play.

John von Neumann described chess as a two-player zero-sum game with perfect informationand proved the minimax theorem in 1928. His famous quote about chess being a form of computa-tion has been realised in modern day AI with the use of deep neural networks to calculate Q-valuesin Bellman’s equation. This has a number of important ramifications for human play—in a givenposition there is an optimal action. There is no ”style” of play—just optimal play where one simplyoptimises the probability of winning and follows the optimal Bellman path of play.

Chess AI was pioneered by Turing (1948) and von Neumann and Shannon (1950) who devel-oped algorithms for solving chess. Shannon’s approach was one of trial and error and ”learning”the optimal policy. Turing (and Champernowne) valued the pieces and in addition they had thefollowing positional evaluation functions: piece mobility, piece safety, king mobility, king safety,and castling. Modern day methods are based on Q learning (a.k.a reinforcement learning). SolvingChess is a daunting NP-hard computational problem, with the Shannon number, which measuresthe number of possible board states, being 10152 (with 1043 legal moves). A major advance, overpure look-ahead calculation engines, are deep neural networks which interpolate the value andpolicy functions from empirical game playing. For example, AlphaZero uses self-play to allowquick solution paths to be calculated and ”learns” chess in less than four hours without any priorknowledge, see Dean et al (2012) and Silver et al (2017) for further discussion.

From an economically motivated perspective, we argue the Gambiteer is willing to forego asmall material advantage for positive skewness in the resulting distribution of optimal continuationQ-values. In this sense, the Gambiteer is irrational—-the decision maker should only worry aboutoptimising future weighted (by the transition probabilities of future states given current state andaction)Q-values and not the skewness of the distribution. Preference for skewness, has been seen inmany other contexts; for example, lotteries and stock markets (Harvey and Siddique, 2000). Therehas been extensive literature showing a human preference for positively skewed odds. In simplersettings, the rational has been applied to bluffing in poker (von Neumann and Morgenstern, 1944,Ferguson and Ferguson, 2003).

Our focus is on one particular aspect of chess: Gambits. The word originates from the Italianword, ”Gambetto”, which most directly translates to, ”the act of tripping”. For the sake of clar-ity, recall our running definition of a Gambit is a policy that has sure loss in at least one state of theworld. But advantages elsewhere, if optimal play is continued under the principle of optimality. Gambitsare commonly played from beginner to advanced levels of chess, however, many people still hold

3

Page 4: Gambits: Theory and Evidence

questions about their optimality. Why Gambits? Our research has shown that Gambits are irrationalunder the context of optimal play from the opponent; it is always true that the Q-value before theGambit is higher than the Q-value during optimal in the Gambit. So why are Gambits some of themost common and deadly openings? The answer lies in skewness preference and humans’ naturaltendency towards error when making complex decisions.

Chess is one of the oldest games in the world with early versions of the game being tracedback to India in the seventh century. It has gained a reputation as one of the trickiest and complexgames to play. Recently however, chess has seen a surge in popularity due to online website likechess.com and lichess.org with nearly 600 million people reporting to play the game regularly. In thelast twenty years, chess theory has seen great improvement in large part due to the emergence ofchess engines such as Stockfish 14 and Leela LCZero. AlphaZero (Sadler and Regan 2019) was theoriginal deep neural network engine, but due to a handful of published games, access is limited. Amore extensive review of the history of chess engines and how they work is described in Section 2.

At first glance, chess may not seem like an apparent place to study economics and decision-making. In chess and most economic theory, agents are presumed to be acting in the most rational,stable, manner. In concurrence with the work of behavioral economists, we will be discussing ourfinding of a so-called ”anomaly” in human behavior in chess, see Camerer and Thaler (1995).

What makes chess a quintessential environment for behavioral economic study? Successful be-havioral experiments demand two critical aspects at their core, otherwise they run the risk of beingwaved off and ignored. First, they need to be repeatable in order to have large sample sizes on hu-man behavior. For example, one common line of thought attempting to explain behavioral anoma-lies follows: if people are given the chance to perform multiple times, they will learn from theirmistakes (Thaler, 2014). Second, experiments need incentives; stakes. Uncompromising economiststry to make the difficult to disprove claim that if the stakes are raised, anomalies in behavior willdisappear. A fruitful behavioral theory will be able to counter these arguments.

The rest of our paper is outlined as follows. The next subsection provides connections withprevious research. Section 2 provides an in depth analysis of rationality and sequential decisionmaking via stochastic dynamic programming. In particular, we describe how to calculate Q valuesfor a decision making process and show how the principle of optimality provides a constraint thatwe can test. Section 3 applies our methodology to the Stafford-Boden-Kieseritzky-Morphy Gambits.Section 4 to the Smith-Morra, Goring, Danish and Halloween Gambits. Section 5 concludes andprovides directions for future research.

1.1 Connections with Behavioral Game Theory

Our work builds on an extensive literature of behavioral decision making (Tversky, 1972) andgame theory, see Barberis and Thaler (2012) and Gigerenzer and Brighton (2009). While behavioralgames are immensely useful they are stylistic models in simplified settings and may not generaliseto more complex, highly strategic or highly incentivized settings (Camerer, 1997). There is littleto no work on Gambits and complex sequential decision problems. Humans are more likely totry Gambits in complex settings. The advent of optimal chess engine moves and assessments nowallows us to study such phenomena.

Holdaway and Vol (2021) analyze over one billion online chess games for human behaviour.They find that players not only exhibit state-dependent risk preferences, but also change their risk-taking strategy depending on their opponent and that this effect differs in experts and novices. AsCamerer observes ”Born or made? I think good investors are like chess players” ”You have to knowa lot. Partly it’s temperament. To be a great investor you need a weird combination of aggression

4

Page 5: Gambits: Theory and Evidence

and patience”.Kahneman and Tversky (2013) describe loss aversion, behavioral biases and Camerer and Ho

(2014) discuss models of such behaviour, including cognitive hierarchies and k-leveling thinking.We also build on Fairness equilibrium which incorporates manners into economics (Rabin 1993).In many markets, politeness pays off. Schelling famously views conflict as a bargaining problemand chess Gambits seem to be no different, rather paradoxically, in such situations, a party canstrengthen its position by overly worsening its own options. The capability to retaliate can be moreuseful than the ability to resist an attack Uncertain retaliation is more credible and more efficientthan certain retaliation, as was known to the Greek philosopher Xenophon—we ought to learn fromour very position that there is no safety for us except in victory.

We also build upon the foundation provided by Jacob Aagaard, who wrote a particularly rel-evant section in his book series about applying Daniel Kahneman’s ”system 1” and ”system 2”thinking to chess. Kahneman distinguishes between system 1—-’fast’, automatic, and subconsciousthought, and system 2—-’slow’, calculating, effortful, and conscious thought (Kahneman, 2013).Aagaard applies these principles to chess, where players encounter different types of moves, anddiscusses their corresponding systems and how they can lead to potential for mistakes (Aagaard,2018).

Our proposal is that chess is an environment ripe to study behavioral economics and gametheory. To be deemed acceptable, the chess environment should satisfy the formerly mentionedtwo critical criteria—repeatability and stakes. Although not a popular choice historically, in recentyears, with the eve of the internet and online chess, modern databases give researchers and play-ers alike access to millions (8.4 million to be exact) of human generated games, offering the chessenvironment as a solution to these counter-arguments.

Million game databases provide immense diversity in idea, and more importantly, the firstcritical aspect, repeatability. Tournaments also act as a built-in stake modulator—state and nationalchampionships are high stake, high incentive situations, while online blitz (short time-control)games tend to be lower stake.

2 Chess AI and Optimal Play

Chess problems are the hymn tunes of mathematics.—G. H. Hardy

The dynamic programming method known as Q-learning breaks the decision problem intosmaller sub-problems. Bellman’s principle of optimality describes how to do this:

Bellman Principle of Optimality: An optimal policy has the property that whatever the initial stateand initial decision are, the remaining decisions must constitute an optimal policy with regard to the stateresulting from the first decision. (Bellman, 1957)

Backwards Induction identifies what action would be most optimal at the last node in thedecision tree (a.k.a. checkmate). Using this information, one can then determine what to do at thesecond-to-last time of decision. This process continues backwards until one has determined the bestaction for every possible situation (a.k.a solving the Bellman equation).

Chess NNUEs. First, one needs an objective function. In the case of chess it is simply the probabilityof winning the game. Chess engines optimize the probability of a win via Bellman’s equation anduse deep learning to evaluate the value and policy functions. The value function V (s) is simply theprobability of winning (100% (a certain win) to 0% (a certain loss). For a given state of the board,

5

Page 6: Gambits: Theory and Evidence

denoted by s, the value function is given by

V (s) = P (winning|s) .

The corresponding Q-value is probability of winning, given policy or move a, in the given state sand following the optimal Bellman path thereafter, we write

Q(s, a) = P (winning|s, a) .

NN engines like AlphaZero don’t use centi-pawn evaluations of a position but we can simply trans-form from centi-pawns to probabilities as follows: The Win probability P (winning|s) is related tocenti-pawn advantage c(s) in state s of the board via the identity

w(s) = P (winning|s) = 1/(1 + 10−c(s)/4) and c(s) = 4 log10 (w(s)/(1− w(s)))

This is turn can be related to Elo rating (Glickman, 1995).Hence this will allow us to test the rationality of Gambits by measuring the difference between

optimal play and Gambit play using the optimal BellmanQ-values weighted by the transition prob-abilities p(s?|s, a), estimated from human databases. At the beginning of the game, Stockfish 14estimates that the centi-pawn advantage is c(0) = 0.2 corresponding to P (white winning) = 0.524.Another way of estimating transition probabilities of human play is to use Maia. The chess en-gine is a neural network like Alpha-Zero, trained from actual human play from the 8 billion gamedatabase on LiChess. It has 9 levels of difficulty corresponding to different levels of elo, and is themost accurate engine in predicting human play at every level. Hence, it could provide more ac-curate estimates via simulation of Maia games, see McIlroy-Young et al (2020). We leave this as adirection for future research. See Maharaj et al (2021) for a comparison of other engine architectureslike Stockfish and Leela-Zero.

The optimal sequential decision problem is solved by Q-learning, which calculates the Q-matrix (Korsos and Polson, 2014, Polson and Witte, 2015), denoted by Q(s, a) for state s and actiona. The Q-value matrix describes the value of performing action a (chess move( in our current state s(chess board position) and then acting optimally henceforth. The current optimal policy and valuefunction are given by

V (s) = maxa

Q(s, a) = Q(s, a?(s))

a?(s) = argmaxa Q(s, a).

The policy function simply finds the optimal map from states to actions and substituting into the Qvalues we obtain the value function at a given states.

Stockfish 14 simply takes the probability of winning as the objective function. Hence at eachstage V (s) measures the probability of winning. This is typically reported as a centi-pawn advan-tage.

The Bellman equation for Q-values becomes (assuming that the instantaneous utility u(s, a))and that the Q matrix is time inhomogeneous, is the constraint

Q(s, a) = u(s, a) +∑s?∈S

P (s?|s, a)maxa

Q(s?, a)

where P (s?|s, a) denotes the transition matrix of states and describes the probability of moving tonew state s? given current state s and action a. The new state is clearly dependent on the current

6

Page 7: Gambits: Theory and Evidence

action in chess and not a random assignment. Bellman’s optimality principle is therefore simplydescribing the constraint for optimal play as one in which the current value is a sum over all futurepaths of the probabilistically weighted optimal future values of the next state.

Taking maximum value over the current action a yields

V (s) = maxa

{u(s, a) +

∑s?∈S

P (s?|s, a)V (s?)

}where V (s?) = maxa Q(s?, a).

Definition: A Gambit Policy aG has the following property in terms of Q-values:

Q(aG, s?) < 0 and Q(a, s?) > 0 for other states.

However, in order to better model human behavior, we will use human game data archived inchess databases such as Chessbase 15 to get a better understanding of human decision-making andrisk management when playing with and against Gambits. We will call the human likelihood of anaction, ”player probability”, namely P (s?|s, a).

2.1 Ranking Gambits by Q-Values

Table 1 shows Stockfish 14’s evaluation of common Gambits. They yield negative Q-values—as required in our definition. We will ”rank” some of our personal Gambit favorites by their initialQ-values. In another later section we will perform the same ranking of a smaller subset using amore rigorous analysis based on skewness of future Q-values.

The Queen’s Gambit is ineptly named. White appears to be sacrificing the c4 pawn by allowingblack to take it, however, the material is easily regained following d3 and soon after Bxc4. This iswhy it is the only ”Gambit” on our list with a positive initial Q-value.

Opening Q-ValueStafford Gambit -2.56Halloween Gambit -1.93Boden-Kiersitsky-Morphy Gambit -0.87King’s Gambit -0.76Budapest Gambit -0.75Blackmar-Deimer Gambit -0.56Goring Gambit -0.35Danish Gambit -0.35Smith-Morra Gambit -0.32Evans Gambit -0.25Queen’s Gambit∗ +0.39

Table 1: Gambit Q−values: Q(aG, s?) < 0

The previous ranking of Gambits considers only the computer evaluation. Every Gambit, bar-ring the misnomer Queen’s Gambit, was evaluated as unsound according to the computer. In somesense, the computer is the oracle as it is as close to completely rational where every agent under-stands and knows the most optimal lines.

7

Page 8: Gambits: Theory and Evidence

Our proposal is to rank gambits by their skew and volatility.

Opening SkewSmith-Morra Gambit Variation 1 -0.20Halloween Gambit Variation 2 +0.34Danish Gambit Variation 1 +0.89Stafford Gambit Variation 1 +0.92Smith-Morra Variation 2 +1.32Halloween Gambit Variation 1 +1.37Reverse Stafford Gambit +1.39Stafford Gambit Variation 2 +1.45Danish Gambit Variation 2 +1.49

Table 2: Gambit Skew

Opening VolatilityStafford Gambit Variation 1 0.038Smith-Morra Gambit Variation 1 0.056Halloween Gambit Variation 1 0.065Halloween Gambit Variation 2 0.089Reverse Stafford Gambit 0.089Danish Gambit Variation 1 0.096Smith-Morra Gambit Variation 2 0.104Stafford Gambit Variation 2 0.112Danish Gambit Variation 2 0.188

Table 3: Gambit Volatility: Q(aG, s?) < 0

The Smith-Morra (variation 1) is unique in that it has negative skew. It is well known thatthis is a good gambit—-maybe partly due to the fact that computer chess engines rank the Siiiciandefense poorly.

3 Testing Optimality of Gambits

Figure 1 illustrates the path ofQ-values for a Gambit strategy. At the pointGwhere the Gambitis played, the Gambiteer offers a material advantage, corresponding to giving up QG in Q-values.The future path of the optimal objective function Q-values is then illustrated by the red line, wherethe opponent who was challenged to accept the Gambit then plays optimally in the future, leadingto sure loss for the Gambiteer. The green line illustrates the path of Q-values for victory for theGambiteer.

From our working definition,

A Gambit policy has sure loss in at least one state of the world. But optimal play thereafter has advan-tages elsewhere, if optimal play is continued under the principle of Bellman optimality.

In order to test rationality, we need to calculate

8

Page 9: Gambits: Theory and Evidence

V(S)

G

0

0.526

1

QG

Figure 1: Gambit Q Values

1. The value function V (s?) (a.k.a. probability of winning) for optimal play. These are simplyobtained from Stockfish 14 from a transformed verioin of cent-pawn advantage.

2. The transition probability matrix P (s?|s, a). We use a database of human play of games.

Then we simply use the test statistic T (G) defined by the difference between Q-values

T (G) = V (s)−Q(s, aG) = Q(s, aG)−Q(s, aG)

This measures how much does the Gambiteer leaves on the table from playing the Gambit action aGrather than optimal play a?(s).

The trade-off is the exchange for a distribution of Q-values in the next period given by

{Q(s?, aG) : s? ∈ S }

Looking at the skew of the Q-value distribution is key and remember by construction that one onestate has sure loss.

Hence Gambits are sub-optimal—as they don’t satisfy Bellman equation—-rather the Gam-biteer is expressing a preference for skewness in the Q-values in the next period but are optimalgiven error in next period by opponent.

Now we apply our understanding of Q-Learning to develop a strategy to analyze and rankGambits. We will perform our analysis for each of the Gambits based off two ”critical moments”where the Gambiteer could gain an advantage off sub-optimal play from the opponent.

Restating the Bellman Principle of Optimality as

Q(s, a∗(s)) is optimal ⇐⇒ Q(s′, a∗(s′)) ≥ Q(s, a∗(s))

allows us to determine whether playing Gambits is rational by ”testing” to see if the equation issatisfied implying optimal play.

9

Page 10: Gambits: Theory and Evidence

3.1 Skewness Preferences

However, in order to more accurately model human Gambit skewness behavior, we will needto adjust our Q-values. First we use the aforementioned centi-pawn conversion to arrive at winprobability. Then using human data from a chess database assign probabilities to each move. Nowwe can weigh each actions’ win probabilities via their frequency of play, and use our formula forskew to conclude a skewness preference.

Let Q denote the random variable that measures the future Q-values at the next time point.Define the skewness via

κG =∑s′

p(s′|s, aG(s))(Q(s′, aG)−Q∗

σG

)3

where Q∗ is defined as the optimal Q-value from the Bellman equation, namely

Q∗ =∑s′

p(s′|s, aG(s))V (s′).

The volatility of the Gambit strategy is defined by

σ2G =∑s′

p(s′|s, aG(s)){Q(s′, aG)−Q∗}2.

For example, for the Stafford Gambit, using our empirical data to estimate the state transition prob-abilities with human play, we find that

S(Stafford) = +0.92

For the entire database, the skew is S = +1.70 with volatility σ = 0.094. This aligns withbehavioural research on skewness preference for positive skew (Harvey and Siddique, 2000).

4 Gambit Evaluations

Thinking and planning things will go this or that way—but they are messy affairs—GM John Nunn

In the next section we will perform analysis of some of the most popular and dubious Gambits.We will be using an efficiently updatable neural network (NNUE) chess engine, such as Stockfish 14,to derive current and future Q-values. Some future Q-values are rooted from ”system 1”, automaticmoves that a player losing concentration might make, while others are conscious ”system 2” moves,some of the best in the position.

4.1 Stafford Gambit

The Stafford Gambit is an opening for black where the Gambiteer allows white to capture acentral pawn early on in exchange for extreme piece activity and space. It has gained popularityamong beginner and novice level players due to Twitch.com streamers like Eric Rosen and Gotham-Chess broadcasting themselves playing the Gambit to thousands of live viewers.

10

Page 11: Gambits: Theory and Evidence

The position below occurs soon after the Gambit is offered by black from the following line ofplay: 1 e4 e5 2 Nf3 Nf6 3 NXe5 Nc6 4 NXc6 dXc6 5 d3 Bc5

8rZblkZ0s7opo0Zpop60ZpZ0m0Z5Z0a0Z0Z040Z0ZPZ0Z3Z0ZPZ0Z02POPZ0OPO1SNAQJBZR

a b c d e f g h

Current Q-Value +2.56Pre-Gambit Q-Value +0.57Skew +0.92Volatility (σ) 0.038

We will display two charts for each position. The first will contain the Current Q-Value of theposition, the Pre-Gambit Q-Value, which is the evaluation prior to the Gambiteer offering the Gam-bit, the Skew of the data (positive or negative), and finally the volatility (σ). The second chart willdisplay raw data for the next 5 possible future moves. Player probability represents the likelihoodof a real human player to make a move, gathered from data in the database. Win probability issimply a converted Q-Value to make our data easier to understand.

White pd3 Move 6 Be2 Nc3 Bg5 f3 Be3Q-value +2.56 -1.48 -6.20 +1.74 +0.87Player Probability 59% 4% 4% 4% 14%Win Probability 81% 30% 3% 73% 62%

An alternative line of play arises from 1 e4 e5 2 Nf3 Nf6 3 NXe5 Nc6 4 NXc6 dXc6 5 Nc3 Bc5

11

Page 12: Gambits: Theory and Evidence

8rZblkZ0s7opo0Zpop60ZpZ0m0Z5Z0a0Z0Z040Z0ZPZ0Z3Z0M0Z0Z02POPO0OPO1S0AQJBZR

a b c d e f g h

Current Q-Value +2.52Pre-Gambit Q-Value +0.57Skew +1.45Volatility (σ) 0.11

White Nc3 Move 6 h3 Bc4 d3 Be2 Qe2Q-value +2.56 -1.43 -1.55 +0.14 +1.62Player Probability 20% 6% 7% 60% 7%Win Probability 81% 20% 29% 52% 72%

Stafford Gambit. The board above displays the initial starting position for the popular Gambit forblack against e4, except with an alternative line of play. For each of the following positions, insteadof the current Q-Value in the second half of the principle, we will use the ”pre-Gambit Q-value” assome of the selected moments take place already well into the Gambit, leading to an already verynegative position.

The Stafford Gambit is played by black and, in terms of Q-values,

−2.55 < −0.57

Q(s′, a∗(s′)) < Q(s, a∗(s))

Hence, the Stafford Gambit is highly sub-optimal—although a very popular human choice. A sim-ple refutation is pf3 followed by pc3.

4.2 Reverse Stafford Gambit

The reverse Stafford Gambit (a.k.a. Boden-Kierseritsky-Morphy Gambit) has a win percentageof 62% among all white openings, one of highest ever. It is seldom played by grandmasters becausewhite sacrifices a pawn early on. However, the unexperienced player can easily make an extremelycostly mistake.

The line of play is given by 1 e4 e5 2 Bc4 Nf6 3 Nf3 NXe4 4 Nc3 NXc3 5 dXc3

12

Page 13: Gambits: Theory and Evidence

8rmblka0s7opopZpop60Z0Z0Z0Z5Z0Z0o0Z040ZBZ0Z0Z3Z0O0ZNZ02POPZ0OPO1S0AQJ0ZR

a b c d e f g h

Current Q-Value -0.87Pre-Gambit Q-Value +0.62Skew +1.39Volatility (σ) 0.089

Black Move 5 f6 c6 d6 Bc5 Nc6Q-value -0.74 0.00= +3.63 +2.36 +3.34Player Probability 60% 14% 4% 4% 4%Win Probability 40% 50% 89% 79% 87%

4.3 Smith-Morra Gambit

The Smith-Morra Gambit is an opening by white to counter the popular black opening, theSicillian Defense. It utilizes a common theme among Gambits, exchanging a pawn for piece activityand space. The line of play is given by 1 e4 c5 2 d4 cXd4 3 c3

8rmblkans7opZpopop60Z0Z0Z0Z5Z0Z0Z0Z040Z0oPZ0Z3Z0O0Z0Z02PO0Z0OPO1SNAQJBMR

a b c d e f g h

13

Page 14: Gambits: Theory and Evidence

Current Q-Value -0.32Pre-Gambit Q-Value +0.34Skew -0.20Volatility (σ) 0.06

It is worth noting that the Smith-Morra Gambit is the only Gambit we evaluated to have anegative skew value. This is likely because despite White giving up a pawn, the computer evalua-tion is not significantly against them, meaning in the eyes of the oracle, the exchange for activity inthis position is worth it. Therefore, when players are choosing an opening, the Smith-Morra Gam-bit (accepted variation shown above) pays off for white assuming slight sub-optimal play from theopponent.

Black Move 3 dxc3 d3 g6 Nf6 d5Q-value -0.17 +0.93 +0.90 0.00= +0.36Player Probability 32% 11% 24% 18% 4%Win Probability 48% 63% 63% 50% 55%

The line of play is given by 1 e4 c5 2 d4 cXd4 3 c3 dXc3 4 NXc3 Nc6 5 Nf3 d6 6 Bc4 e6 7 O-ONf6 8 Qe2 Be7 9 Rd1 Bd7 10 Nb5 Qb8 11 Bf4

8rl0ZkZ0s7opZbapop60Znopm0Z5ZNZ0Z0Z040ZBZPA0Z3Z0Z0ZNZ02PO0ZQOPO1S0ZRZ0J0

a b c d e f g h

Current Q-Value +0.00Pre-Gambit Q-Value +0.34Skew +1.32Volatility (σ) 0.103

Black Move 11 Ne5 e5 O-O Kf8 Qd8Q-value 0.00= +1.00 +1.70 +4.64 +5.37Player Probability 12% 50% 12% 12% 12%Win Probability 50% 64% 73% 94% 96%

Variation 2 of the Smith-Morra Gambit tells a similar story—-if black plays optimally, he earnshis right to an equal position. But with one error in concentration, appearance of ’system 1’, theentire position can go south extremely fast for him, resulting in a major advantage to the Gambiteer.

14

Page 15: Gambits: Theory and Evidence

4.4 Halloween Gambit

The Halloween Gambit, also know as the Leipzig Gambit, is an extremely aggressive openingfor white where he sacrifices a knight for one of black’s central pawns. We gather two of the mostcritical moments where a seemingly natural move from black leads to catastrophic failure.

The line of play is given by 1 e4 e5 2 Nf3 Nc6 3 Nc3 Nf6 4 NXe5 NXe5 5 d4 Ng6 6 e5 Ng8 7h4 Bb4 8 h5 N6e7 9 Qg4 g6 10 hXg6 NXg6 11 Qg3 N8e7 12 Bg5

8rZblkZ0s7opopmpZp60Z0Z0ZnZ5Z0Z0O0A040a0O0Z0Z3Z0M0Z0L02POPZ0OPZ1S0Z0JBZR

a b c d e f g h

Current Q-Value -0.86Pre-Gambit Q-Value +0.17Skew +1.38Volatility (σ) 0.06

Black Move 12 O-O d6 d5 Nf5 Bxc3+Q-value Mate in 5 -0.41 -1.45 -2.43 -0.81Player Probability 20% 14% 16% 38% 12%Win Probability 100% 44% 30% 20% 39%

The line of play is given by 1 e4 e5 2 Nf3 Nc6 3 Nc3 Nf6 4 NXe5 NXe5 5 d4 Ng6 6 e5 Ng8 7Bc4 Bb4 8 Qf3 f6 9 O-O d5 10 eXd6 BXd6 11 Ne4

15

Page 16: Gambits: Theory and Evidence

8rZblkZns7opo0Z0op60Z0a0onZ5Z0Z0Z0Z040ZBONZ0Z3Z0Z0ZQZ02POPZ0OPO1S0A0ZRJ0

a b c d e f g h

The continuation values are given by

Current Q-Value -0.90Pre-Gambit Q-Value +0.17Skew +0.35Volatility (σ) 0.09

Black Move 11 N8e7 Bd7 Be7 Qe7 Kf8Q-value +1.55 +1.12 -1.17 +1.10 -0.70Player Probability 25% 6% 25% 35% 6%Win Probability 71% 66% 34% 65% 40%

We now turn to the Danish and Goring Gambits.

4.5 Danish and Goring Gambit

The Goring Gambit arises from White Gambiting the pd4 after the initial move pe4. The Dan-ish Gambit adds one more level to the Gambit by also offering the pc3. If both Gambits are acceptedthen the Gambiteer has rapid development of his two bishops for the price of losing two pawns.Black neglects his Kingside development and King safety which White then tries to aggressivelyexploit.

The line of play is given by 1 e4 e5 2 d4 eXd4 3 c3 dXc3 4 NXc3

16

Page 17: Gambits: Theory and Evidence

8rmblkans7opopZpop60Z0Z0Z0Z5Z0Z0Z0Z040Z0ZPZ0Z3Z0M0Z0Z02PO0Z0OPO1S0AQJBMR

a b c d e f g h

Current Q-Value -0.35Pre-Gambit Q-Value +0.40Skew +0.89Volatility (σ) 0.10

Black Move 4 Nc6 Nf6 d6 Bc5 Bb4Q-value -0.17 +1.35 0.00= -0.40 -0.20Player Probability 53% 3% 20% 3% 20%Win Probability 48% 69% 50% 44% 47%

Now consider the following line of play 1 e4 e5 2 d4 eXd4 3 c3 dXc3 4 NXc3 Nc6 5 Bc4 Nf6 6Nf3 d6 7 Qb3

8rZblka0s7opo0Zpop60Zno0m0Z5Z0Z0Z0Z040ZBZPZ0Z3ZQM0ZNZ02PO0Z0OPO1S0A0J0ZR

a b c d e f g h

The table of continuation values is given by

17

Page 18: Gambits: Theory and Evidence

Current Q-Value =0.00Pre-Gambit Q-Value +0.40Skew +1.49Volatility (σ) 0.19

Black Move 7 Qd7 Be6 Be7 Qe7 d5Q-value -0.22 +1.97 +2.54 +1.00 +3.36Player Probability 92% 1% 1% 4% 1%Win Probability 47% 76% 81% 64% 87%

Again one can see the skewness in the continuation values.

5 Discussion

We provide a theory of Gambits together with empirical evidence from chess. Surprisinglylittle academic work has been directed towards Gambits—even though they are an archetypal ex-ample of human decision-making. From a game theory perspective, variable-sum games such aschess are ultra competitive. For a number of reasons, chess provides a natural empirical setting tostudy Gambits. We provide a detailed analysis of a number of commonly played Gambits. On theone hand, a Gambit policy is not rational; the Gambit leaves open one state of the world where theopponent can win with certainty. On the other hand, the Gambit leads the Gambiteer to advantagewith any sub-optimal play from the opponent. We provide a number of economic motivations forplaying Gambits—from signaling a player type, to a commitment strategy for one to play optimallyand dynamically to the theory of conflict and contract theory where surrendering an upside hasbeen shown to be a good negotiation tool.

From a decision-making behavioral game theory viewpoint, the Gambiteer is expressing apreference for skewness on the Q-Values associated with their optimal path of play. Rational plan-ning requires computation of the Bellman optimality path and even the best laid plans can fall bythe wayside. ”No battle plan ever survives first contact with the enemy”—Helmuth Von Moltke),and in chess strategy, one has to be dynamic. If the chess market were efficient, Gambits wouldnot be played. Our view is that agents have an irrational preference for skew in the Q-values andGambits can thrive at all levels of the game, even the most competitive high stakes matches.

From an economic perspective, Tarrasch’s view was that the goal of the Gambiteer is not justwinning the game but one of also signaling his type to the market of players. His reputation counts.Modern day grandmasters (such as Magnus Carlsen), however, are more frequently declining Gam-bits (with no need to let the other player ”prove” his policy) with the increasing prevalence andaccess to super computers and engines, thus making themselves unavailable for such threats. Wehave entered a world of cat and mouse, from the romantic era—of being gentlemanly and acceptingGambits with the burden of proof on the other player.

There are many directions for future research, both theoretically and empirically. For example,will AI provide new strategies for human chess play and Gambits? Are there other fields where itwill suggest new organizational and decision making procedures? Will AI lead to new methods ofcooperation, or new methods for the allocation of resources? In many ways, one human view of AIis that it has by-passed a lot of work that is now unnecessary for humans to perform and opened thedoor to more leisure time and learning for humans (Keynes, 1930). For each gambit, we discuss a

18

Page 19: Gambits: Theory and Evidence

player probability derived from the likelihood of a player to make that move based off the database.It is worth mentioning, however, that there exists a database with 8 billion chess games containedof play for all levels. Our database was limited to expert play, and thus was only 2 million. Withaccess to larger databases, how can we study where exactly humans, not necessarily just experts,are making their mistakes? Could these mistakes be repeatable and predictable, and if so, how canwe prevent them, and perhaps how can we apply this learning to other areas of study?

6 References

Aagaard, J., (2018). ”Grandmaster Preparation: Thinking Inside the Box,” Quality Chess.

Maharaj, S., N. Polson, A. Turk (2021). ”Chess AI: Competing Paradigms for Machine Intelligence”.arXiv:2109.11602.

Barberis, N. and R. Thaler (2012). ”A Survey of Behavioural Finance,” Advances in Behavioral FinanceVolume, III, pg. 1-76.

Camerer, C., G. Loewenstein, and M. Rabin (2004). ”Advances in Behavioral Economics,” PrincetonUniversity Press.

Camerer, C. and T. Ho (2015). ”Behavioral Game Theory Experiments and Modeling,” Handbook ofGame Theory with Economic Applications, 4, 517-573.

Camerer, C. (1997). ”Progress in Behavioral Game Theory,” Journal of Economic Perspectives—-Volume11, Number 4—-Fall 1997 pg. 167-188

Camerer, C. and R. Thaler (1995). ”Anomalies: Ultimatums, Dictators, and Manners,” Journal ofEconomic Perspectives—-Volume 9, Number 2—-Spring 1995 pg. 209-219.

Dean, J., G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker,K. Yang, A. Ng (2012). ”Large Scale Distributed Deep Networks,” Advances in Neural InformationProcessing Systems, 25, pg. 1223-1231.

de Finetti, B. (1936). ”Les Probabilites Nulles,” Bulletin de Sciences Mathematiques, 275-288.

Edwards, W., H. Lindman and J. Savage (1963). ”Bayesian Statistical Inference for PsychologicalResearch,” Psychological Review, 70(3), pg. 193-242.

Ferguson, C. and T. Ferguson (2003). ”On the von Neumann model of Poker,” Game Theory andApplications. Franchi, F. (2005). ”Chess, Games, and Files”, Essays in Philosophy, Vol. 6, Iss. 1, Article

6.

Gigerenzer, G., and H. Brighton (2009). ”Homo Heuristicus: Why Biased Minds Make Better Infer-ences”, Topics in Cognitive Science, Vol. 1, Iss. 1, pg. 107-143.

Glickman, M. (1995). ”A Comprehensive Guide To Chess Ratings,” American Chess Journal, 3(1), pg.59-102.

Harvey, C. and Siddique, A. (2000). ”Conditional Skewness in Asset Pricing Tests,” The Journal ofFinance, 55 (3), pg. 1263-1295

Holdaway, C., E. Vul (2021). ”Risk-taking in adversarial games: What can 1 billion online chessgames tell us?”, Proceedings of the Annual Meeting of the Cognitive Science Society, pg. 43.

19

Page 20: Gambits: Theory and Evidence

Kahneman D., and A. Tversky (2013). ”Prospect Theory: An Analysis of Decision under Risk,”Handbook of Financial Decision Making, Part I, 99-127.

Kahneman D. (2013). ”Thinking, Fast and Slow,” Farrar Straus Girroux.

Keynes, J. M. (1930). Economic Consequences for our Grandchildren.

Korsos, L. and N. G. Polson (2014). ”Analyzing Risky Choices: Q-Learning for Deal-No-Deal,”Applied Stochastic Models in Business and Industry, 30(3), 258-270.

McIlroy-Young, R., Kleinberg, J., Siddhartha, S., Anderson, A., (2020). ”Aligning Superhuman AIwith Human Behavior: Chess as a Model System”, ACM SIGKDD International Conference on Know-eldge Discovery and Data Mining 2020, 26, pg. 1677-1687

Moxley, J., Ericsson, K., Charness, N., Krampe, R. (2012). ”The role of intuition and deliberativethinking in experts’ superior tactical decision-making”, Cognition, vol. 124, pg. 72-78.

Pievatolo, A., F. Ruggeri, R. Soyer, S. Wilson (2021). ”Decisions in Risk and Reliability: An Explana-tory Perspective,” Stats 2021, 4, pg. 228–250.

Polson, N.G. and J. Scott (2018). ”AIQ: How People and Machines are Smarter Together”, St. Martin’spress.

Polson, N.G. and M. Sorensen (2012). ”A Simulation-Based Approach to Stochastic Dynamic Pro-gramming,” Applied Stochastics Models in Business and Industry, 27(2), pg. 151-163.

Polson, N.G. and J. Witte (2015). ”A Bellman view of Jesse Livermore”, Chance, 28(1), pg. 27-31.

Rabin, M. (1993). ”Incorporating Fairness into Game Theory and Economics,” American EconomicReview, LXXXIII, pg. 1281-1302.

Ramsey, F. (1926). ”Truth and Probability,” The Foundation of Math and Other Logical Essays.

Rios, D., Ruggeri, F., Soyer, F., Wilson, S. (2020). ”Advances in Bayesian Decision Making in Relia-bility”, European Journal of Operational Research, 282(1), pg. 1-18.

Sadler, M. and N. Regan (2019). ”Game Changer: AlphaZero’s Groundbreaking Chess strategiesand the Promise of AI,” New in Chess.

Savage, J. (1954). ”Foundations of Statistics.” Wiley.

Schelling, T. (1960). ”The Strategy of Conflict”. Harvard University Press.

Schelling, T. (2006), ”Micromotives and Macrobehavior,” Norton.

Shannon, C. E. (1950). ”Programming a Computer to Play Chess,” Philosophical Magazine, 7(41), pg.314.

Shiller, E. (2011). Attack with the Boden-Kieseritzky-Morphy Gambit, Chessworks.

Silver, D., A. Huang, C. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittweiser, I. Antonoglou,V. Paneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T.Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis (2017). ”Mastering the game of Gowithout Human Knowledge,” Nature, 550, 354-359.

Thaler, R. (2014). ”Misbehaving: The Making of Behavioral Economics,” WW Norton.

Turing, A. (1948). ”Intelligent Machinery,” The Essential Turing, pg. 395-432.

20

Page 21: Gambits: Theory and Evidence

Tversky A. (1972). ”Elimination of Aspects: A Theory of Choice”, Psychological Review, 79, pg. 281-299.

von Neumann, J. and O. Morgenstern (1944). ”The Theory of Economic Games and Behavior,”Princeton University Press.

21