varoufakis versus nash learning in unpro table …the critique of hargreaves heap and varoufakis...

Varoufakis versus NashLearning in unprofitable games

Josef Hofbauer, Vienna

http://yanisvaroufakis.eu/books/game-theory-a-critical-text/

Shaun P. Hargreaves Heap and Yanis Varoufakis:

Game Theory. A Critical Text. (2004): Game 2.11, p. 62

C1 C2 C3

R1 1,1 100,0 -100,1

R2 0,100 1,1 100,1

R3 1,-100 1,100 1,1

(1)

C1 C2 C3

R1 1,1 100,0 -100,1

R2 0,100 1,1 100,1

R3 1,-100 1,100 1,1

They write:

In this game the unique Nash equilibrium in pure strategies seems

unreasonable to us. The unique pure strategy Nash equilibrium is

(R1,C1), but note that player R can secure, with absolute certainty,

exactly the same payoff (1) as that associated with (R1,C1) by

playing R3. In sharp contrast, R1 comes with the risk of −100 if

C plays C3. Does the fact that R1 may also yield +100 utils (if C

responds to her R1 with C2) cancel out the risk of playing R1?

The critique of Hargreaves Heap and Varoufakis sounds convincing.

Since C3 is the maximin strategy for player 2, and is likely to be

played by a risk averse player, it is indeed quite risky to play R1. So

the more reasonable play in this game is R3 resp. C3, at least in

the one-shot situation.

Harsanyi (1964): similar examples of 2× 2 games

Aumann & Maschler (1972), Aumann (1985)

L R

T N,0 0,1

B 0,1 1,0

Aumann & Maschler (1972), Aumann (1985)

L R

T N,0 0,1

B 0,1 1,0

NE: x = T+B2 , y = L+NR

1+N , equilibrium payoffs: NN+1 and 1

2.

prudent strategies: x = T+NB1+N and y = L+R

2 .

But many people prefer to play T with probability x >> 12 because

of the attractive payoff N >> 1.

Holler (1990):

L R

T 1,1 1,0

B 2,1 0,4

This is again a cyclic 2× 2 game.

NE x = (34,

14), y = (1

2,12), equilibrium payoffs (1,1).

These payoffs are guaranteed by the pure maximin strategies TL.

unprofitable game: for each player no equilibrium yields more than

his maxmin payoff.

Harsanyi (1964) argues that in such a situation it is more rational

to use a maximin strategy (even if it is not an equilibrium strategy)

than it is to use an equilibrium strategy (if it is not a maximin

strategy)

Harsanyi (1966) formulates Rationality Postulates for Game Situa-

tions. The first one is the Maximin Postulate:

In a game G unprofitable to you, always use a maximin strategy. (In

other words, if you cannot hope to obtain more than your maximin

payoff anyhow, then use a strategy that will absolutely assure you

at least that much.)

Aumann and Maschler (1972):

To be sure, we do not know what to recommend in this situa-

tion; but the maximin strategies seem preferable to the equilibrium

strategies.

. . .

Under these conditions, the use of the equilibrium strategies does

not seem reasonable.

Can players learn Nash equilibrium?

cyclic 2× 2 games:

learning procedures oscillate around the unique equilibrium, many

(FP, regret matching, . . . ) converge to the NE

in HH & Varoufakis’ game:

R2 is best reply to C3, then C1 is player 2’s best reply against R2,

and in the long run, the NE (R1, C1) might emerge.

Can players learn Nash equilibrium?

more general symmetric two person game, with the same strategicstructure (subtract 1 from each entry of the payoff matrix):

A =

0 a −b

−1 0 c

0 0 0

with a, b, c > 0. (2)

Game (1) corresponds to the case a = c = 99, b = 101.Besides (R1, C1) the game (2) has another, completely mixed

and symmetric NE, given by the proportions (ac : b : a) for bothplayers. For game (1), it is (992 : 101 : 99) which is rather close tothe pure NE. expected payoff at the mixed equilibrium equals 0 =payoff at the pure equilibrium and the maximin payoff for game (2).

If a game has 2 NE then one of them is not robust (odd numbertheorem, index theorem): the pure NESo it is the mixed equilibrium which seems more relevant.

Replicator dynamics:

(EWA, reinforcement learning, proportional imitation rule, . . . )

x1 = x1(ax2 − bx3 − u)

x2 = x2(−x1 + cx3 − u)

x3 = x3(−u)

(3)

coordinate transformation: x = x2x3

and y = x1x3

x = x(c− y)

y = y(−b + ax)(4)

classical Lotka–Volterra predator–prey equation!

Closed orbits, surrounding the equilibrium x = ba, y = c.

Hence (3) has only closed orbits around the mixed NE.

x1 + ax2

x3− c logx1 − b logx2 + (b + c) logx3

is a constant of motion.

The time averages of these periodic solutions coincide with the

mixed NE, see H&S (1988,1998).

A population of (boundedly rational, Darwinian) players on the

average plays the mixed NE, and the average payoff

1

T

∫ T

0u(x(t))dt = 0

A =

0 a −b

−1 0 c

0 0 0

with a, b, c > 0.

Best response dynamics

The population distribution x ∈ ∆ moves continuously towards a

best reply.

x ∈ BR(x)− x (5)

The best reply regions are:

1 = BR(x) iff ax2 − bx3 > −x1 + cx3 and ax2 − bx3 > 0.

2 = BR(x) iff −x1 + cx3 > ax2 − bx3 and −x1 + cx3 > 0.

3 = BR(x) iff 0 > ax2 − bx3 and 0 > −x1 + cx3.

BR paths always head towards a pure strategy.

The function V (x) = maxi(Ax)i = max{ax2−bx3,−x1+cx3,0} satis-

fies V = −V along solutions, and hence V (x(t))→ 0. The attractor

{x : V (x) = 0} = {x : ax2 − bx3 ≤ 0,−x1 + cx3 ≤ 0} is the re-

gion where strategy 3 is a best reply. So we end up in the region

where the maximin strategy is best!! So if a newcomer enters the

population (already at the attractor), he should play the maximin

strategy!

The attracting triangle {x : V (x) = 0} is a transitive region: it

contains a dense orbit.

Both NE are on the border of this region, but both are highly unsta-

ble: There are solutions that start at Nash, but leave it (nonunique-

ness). The pure NE cannot be reached from any other state in finite

time, but all solutions with V (x) > 0 converge to it! The mixed NE

can be reached in finite time from all states x with V (x) = 0 (in-

cluding the pure NE)!

Summary

While Hargreaves Heap and Varoufakis are perfectly right in their

critique of Nash equilibrium as a suggestion for actual play, we show

that in an evolutionary or learning context, at least the mixed NE

of games (1) and (2) has some relevance.

Can one save Nash in this way for every nonprofitable game?

Extension I

more general (symmetric) nonprofitable games (in game (2): d = 0)

A =

0 a −b

−1 d c

0 0 0

with b > 0, c > 0, a > d. (6)

(top,left) is the unique pure NE.(bottom, right) is the maximin strategy.

Morgan and Sefton (2002) studied the case b = 1, d = −1, a = c

simplest nonprofitable games: only three different payoffs: 0,−1, aThey performed experiments for a = 8/3 and a = 2/3.

4 12 1

1 1 12

4 4 4

4 6 1

1 1 6

4 4 4

transformation to Lotka–Volterra system: x = x2x3

and y = x1x3

x = x(c + dx− y)

y = y(−b + ax)

Case d < 0: predator–prey system

(L) ac + bd > 0 coexistence: globally stable interior equilibrium

→ completely mixed NE, globally attracting for (REP) and (BR).

(R) ac + bd ≤ 0 predator (= pure NE strategy) dies out

→ NE mixing between strategies 2 and 3, evolutionarily stable,

globally attracting for (REP), asymptotically stable for (BR).

Evolution and learning lead to NE.

BUT: in experiments, average play if far from NE and maximin!

in (R) game, players collect less than maximin payoff

in (L) game: slightly more

Case d > 0:

x = x(c + dx− y)

y = y(−b + ax)

There exists a positive equilibrium, global repeller.

→ There exists a completely mixed NE, global repeller for (REP),

local repellor for (BR). Learning does not lead to NE.

Extension II

arbitrary (symmetric) RSP game - framed as an unprofitable game0 a2 −b3

−b1 0 a3

a1 −b2 0

∼−a1 a2 + b2 −b3

−a1 − b1 b2 a3

0 0 0

(ai, bi > 0).

pure maximin strategy

no pure NE

unique, completely mixed NE (symmetric)

at NE: expected payoff = 0

Gaunersdorfer & Hofbauer (GEB 1995):

If a1a2a3 < b1b2b3 the NE is a repellor for (REP) and (BRD).

Learning does not lead to the NE!

Example: M,N large −1 2 −N

−M 1 1

0 0 0

Conclusion

NE is an elegant mathematical solution concept for games.

NE are the fixed points of the BR correspondence, they are fixed

points of many evolutionary and learning processes.

But is NE a reasonable suggestion for playing a game or a good

prediction of the outcomes of learning?

In some situations YES:

strict NE, zero-sum games, ESS, potential games

In general, I doubt it.

varoufakis versus nash learning in unpro table …the critique of hargreaves heap and varoufakis...

Documents