Varoufakis versus NashLearning in unprofitable games
Josef Hofbauer, Vienna
http://yanisvaroufakis.eu/books/game-theory-a-critical-text/
Shaun P. Hargreaves Heap and Yanis Varoufakis:
Game Theory. A Critical Text. (2004): Game 2.11, p. 62
C1 C2 C3
R1 1,1 100,0 -100,1
R2 0,100 1,1 100,1
R3 1,-100 1,100 1,1
(1)
C1 C2 C3
R1 1,1 100,0 -100,1
R2 0,100 1,1 100,1
R3 1,-100 1,100 1,1
They write:
In this game the unique Nash equilibrium in pure strategies seems
unreasonable to us. The unique pure strategy Nash equilibrium is
(R1,C1), but note that player R can secure, with absolute certainty,
exactly the same payoff (1) as that associated with (R1,C1) by
playing R3. In sharp contrast, R1 comes with the risk of −100 if
C plays C3. Does the fact that R1 may also yield +100 utils (if C
responds to her R1 with C2) cancel out the risk of playing R1?
The critique of Hargreaves Heap and Varoufakis sounds convincing.
Since C3 is the maximin strategy for player 2, and is likely to be
played by a risk averse player, it is indeed quite risky to play R1. So
the more reasonable play in this game is R3 resp. C3, at least in
the one-shot situation.
Harsanyi (1964): similar examples of 2× 2 games
Aumann & Maschler (1972), Aumann (1985)
L R
T N,0 0,1
B 0,1 1,0
Aumann & Maschler (1972), Aumann (1985)
L R
T N,0 0,1
B 0,1 1,0
NE: x = T+B2 , y = L+NR
1+N , equilibrium payoffs: NN+1 and 1
2.
prudent strategies: x = T+NB1+N and y = L+R
2 .
But many people prefer to play T with probability x >> 12 because
of the attractive payoff N >> 1.
Holler (1990):
L R
T 1,1 1,0
B 2,1 0,4
This is again a cyclic 2× 2 game.
NE x = (34,
14), y = (1
2,12), equilibrium payoffs (1,1).
These payoffs are guaranteed by the pure maximin strategies TL.
unprofitable game: for each player no equilibrium yields more than
his maxmin payoff.
Harsanyi (1964) argues that in such a situation it is more rational
to use a maximin strategy (even if it is not an equilibrium strategy)
than it is to use an equilibrium strategy (if it is not a maximin
strategy)
Harsanyi (1966) formulates Rationality Postulates for Game Situa-
tions. The first one is the Maximin Postulate:
In a game G unprofitable to you, always use a maximin strategy. (In
other words, if you cannot hope to obtain more than your maximin
payoff anyhow, then use a strategy that will absolutely assure you
at least that much.)
Aumann and Maschler (1972):
To be sure, we do not know what to recommend in this situa-
tion; but the maximin strategies seem preferable to the equilibrium
strategies.
. . .
Under these conditions, the use of the equilibrium strategies does
not seem reasonable.
Can players learn Nash equilibrium?
cyclic 2× 2 games:
learning procedures oscillate around the unique equilibrium, many
(FP, regret matching, . . . ) converge to the NE
in HH & Varoufakis’ game:
R2 is best reply to C3, then C1 is player 2’s best reply against R2,
and in the long run, the NE (R1, C1) might emerge.
Can players learn Nash equilibrium?
more general symmetric two person game, with the same strategicstructure (subtract 1 from each entry of the payoff matrix):
A =
0 a −b
−1 0 c
0 0 0
with a, b, c > 0. (2)
Game (1) corresponds to the case a = c = 99, b = 101.Besides (R1, C1) the game (2) has another, completely mixed
and symmetric NE, given by the proportions (ac : b : a) for bothplayers. For game (1), it is (992 : 101 : 99) which is rather close tothe pure NE. expected payoff at the mixed equilibrium equals 0 =payoff at the pure equilibrium and the maximin payoff for game (2).
If a game has 2 NE then one of them is not robust (odd numbertheorem, index theorem): the pure NESo it is the mixed equilibrium which seems more relevant.
Replicator dynamics:
(EWA, reinforcement learning, proportional imitation rule, . . . )
x1 = x1(ax2 − bx3 − u)
x2 = x2(−x1 + cx3 − u)
x3 = x3(−u)
(3)
coordinate transformation: x = x2x3
and y = x1x3
x = x(c− y)
y = y(−b + ax)(4)
classical Lotka–Volterra predator–prey equation!
Closed orbits, surrounding the equilibrium x = ba, y = c.
Hence (3) has only closed orbits around the mixed NE.
x1 + ax2
x3− c logx1 − b logx2 + (b + c) logx3
is a constant of motion.
The time averages of these periodic solutions coincide with the
mixed NE, see H&S (1988,1998).
A population of (boundedly rational, Darwinian) players on the
average plays the mixed NE, and the average payoff
1
T
∫ T
0u(x(t))dt = 0
A =
0 a −b
−1 0 c
0 0 0
with a, b, c > 0.
Best response dynamics
The population distribution x ∈ ∆ moves continuously towards a
best reply.
x ∈ BR(x)− x (5)
The best reply regions are:
1 = BR(x) iff ax2 − bx3 > −x1 + cx3 and ax2 − bx3 > 0.
2 = BR(x) iff −x1 + cx3 > ax2 − bx3 and −x1 + cx3 > 0.
3 = BR(x) iff 0 > ax2 − bx3 and 0 > −x1 + cx3.
BR paths always head towards a pure strategy.
The function V (x) = maxi(Ax)i = max{ax2−bx3,−x1+cx3,0} satis-
fies V = −V along solutions, and hence V (x(t))→ 0. The attractor
{x : V (x) = 0} = {x : ax2 − bx3 ≤ 0,−x1 + cx3 ≤ 0} is the re-
gion where strategy 3 is a best reply. So we end up in the region
where the maximin strategy is best!! So if a newcomer enters the
population (already at the attractor), he should play the maximin
strategy!
The attracting triangle {x : V (x) = 0} is a transitive region: it
contains a dense orbit.
Both NE are on the border of this region, but both are highly unsta-
ble: There are solutions that start at Nash, but leave it (nonunique-
ness). The pure NE cannot be reached from any other state in finite
time, but all solutions with V (x) > 0 converge to it! The mixed NE
can be reached in finite time from all states x with V (x) = 0 (in-
cluding the pure NE)!
Summary
While Hargreaves Heap and Varoufakis are perfectly right in their
critique of Nash equilibrium as a suggestion for actual play, we show
that in an evolutionary or learning context, at least the mixed NE
of games (1) and (2) has some relevance.
Can one save Nash in this way for every nonprofitable game?
Extension I
more general (symmetric) nonprofitable games (in game (2): d = 0)
A =
0 a −b
−1 d c
0 0 0
with b > 0, c > 0, a > d. (6)
(top,left) is the unique pure NE.(bottom, right) is the maximin strategy.
Morgan and Sefton (2002) studied the case b = 1, d = −1, a = c
simplest nonprofitable games: only three different payoffs: 0,−1, aThey performed experiments for a = 8/3 and a = 2/3.
4 12 1
1 1 12
4 4 4
4 6 1
1 1 6
4 4 4
transformation to Lotka–Volterra system: x = x2x3
and y = x1x3
x = x(c + dx− y)
y = y(−b + ax)
Case d < 0: predator–prey system
(L) ac + bd > 0 coexistence: globally stable interior equilibrium
→ completely mixed NE, globally attracting for (REP) and (BR).
(R) ac + bd ≤ 0 predator (= pure NE strategy) dies out
→ NE mixing between strategies 2 and 3, evolutionarily stable,
globally attracting for (REP), asymptotically stable for (BR).
Evolution and learning lead to NE.
BUT: in experiments, average play if far from NE and maximin!
in (R) game, players collect less than maximin payoff
in (L) game: slightly more
Case d > 0:
x = x(c + dx− y)
y = y(−b + ax)
There exists a positive equilibrium, global repeller.
→ There exists a completely mixed NE, global repeller for (REP),
local repellor for (BR). Learning does not lead to NE.
Extension II
arbitrary (symmetric) RSP game - framed as an unprofitable game0 a2 −b3
−b1 0 a3
a1 −b2 0
∼−a1 a2 + b2 −b3
−a1 − b1 b2 a3
0 0 0
(ai, bi > 0).
pure maximin strategy
no pure NE
unique, completely mixed NE (symmetric)
at NE: expected payoff = 0
Gaunersdorfer & Hofbauer (GEB 1995):
If a1a2a3 < b1b2b3 the NE is a repellor for (REP) and (BRD).
Learning does not lead to the NE!
Example: M,N large −1 2 −N
−M 1 1
0 0 0
Conclusion
NE is an elegant mathematical solution concept for games.
NE are the fixed points of the BR correspondence, they are fixed
points of many evolutionary and learning processes.
But is NE a reasonable suggestion for playing a game or a good
prediction of the outcomes of learning?
In some situations YES:
strict NE, zero-sum games, ESS, potential games
In general, I doubt it.