game theory and strategic behavior - sites.cs.ucsb.edusuri/ccs130a/gametheory.pdf · game theory...

Game Theory and Strategic Behavior

Subhash Suri

February 24, 2020

1 Game Theory

• The algorithmic questions studied so far can be loosely characterized as dealing withman vs. nature conflicts: we have to made decisions in the face of nature’s uncertainty,i.e. unknown future events, unpredictable effects of medical treatments, etc.

• We now consider a different class of conflicts, involving man vs. man, which is bothubiquitous and subject of a rich field, called Game Theory. The field originated withthe work of John von Neumann, one of the most influential mathematician, physicist,and uber computer scientist.

• To appreciate the broad reaches of game theory, consider a routine stock market trans-action: for a share of stock to be sold at, say, $50, the buyer must believe he can sellit later for, say, $60 to someone who believes he can sell it for $70 to someone whobelieves he can sell it someone else for even higher price and so on. So the value of thestock is what people think other people think it’s worth, with an infinite chain of suchreasoning.

• These kinds of “I know that you know that I know that you know · · ·” can create allsort of counter-intuitive and paradoxical results.

• For an illustration, consider the following guess the number game. Everyone in theroom must guess a number (integer) between 0 and 99, and write it down. I will collectall the numbers, and compute the average A. Then, the winner is the one whose guessis closest to 2/3 of A. What will you guess?

– First, suppose you were the only perfectly rational person in the room; the restdidn’t think too hard. You will guess that A will about 50, so you should guess100/3.

1

– But now if everyone thought this way, they will also reason this way, and so theaverage will drift down from 50 to 33.

– You anticipate this behavior of others in your 2nd iteration of reasoning, so youwill need to guess (2/3)× (2/3)× A.

– But this speculation-counter-speculation continues, and the end result is that allrational agents should guess 0.

• As another example, consider the following setup for auctioning off a $1 bill. I plan tosell a $1 bill to the highest bidder using the following rules for an open cry auction,where you can bid multiple times, each time simply announcing the bid vocally, andbids are positive numbers. There are two rules to this auction:

1. At the end of auction, the highest bidder “wins” the dollar, and pays the valueof his winning bid.

2. The second highest bidder also pays his bid price but receives nothing.

• How should you bid? What do you think is the highest amount one would bid in thisgame?

• The last condition of the game (losing bidder’s obligation to pay) is what complicatesthe game, and often generates interesting responses.

• For instance, if you think no rational player would pay more than 99 cents, then reasonfrom the perspective of the second highest bidder. As a result, bidding/counter-biddingcan continue well past $1, and in a perfectly rational world, there is no upper boundon the highest bid! Indeed, in some of the academic lab experiments, the bids havebeen typically as high as $4 to $5!

• These competitive situations where multiple intelligent players try to “out think” eachother are reminiscent of computer science’s famous halting problem, which asks onemachine to simulate another machine (of equal complexity).

• Game Theory suggests a way out of these unending recursive speculation-counter-speculation loops: Nash Equilibrium.

• Before we can appreciate the significance of Nash’s result, let us consider the approachesfor solving two-person games that predated his work.

• Game Theory covers a very broad spectrum of cooperation and competition, but thestarting point of the theory was a problem very similar to heads up poker : a two personcontest where one person’s gain is another’s loss. These are called Two Person ZeroSum games.

2

• GT seeks to identify the so-called equilibrium for these games, which is a set of strate-gies both players can follow such that unilateral deviation from that strategy is inneither player’s self interest. (That is, assuming that one player is playing its equilib-rium strategy, the second player has no reason to switch; the second player reasons thesame way; and so they both stay in equilibrium.)

2 The Basic 2-Person Zero-Sum Game

• A 2P–ZS game has two players, whose interests are diametrically opposed: win forone means loss for the other. Many recreational or parlor games fit in this category:Tic-Tac-Toe, Chess, Tennis, Soccer, etc. This game models the purest form of conflictbetween two nations, corporations, businesses, enemies, etc.

• The Matrix Representation: The Normal Formal. We can represent the strate-gies and payoffs for such a game as a matrix. A convenient, mnemonic name for playersis Rose R (for row) and Colin C (for column). Let’s consider a purely abstract toyexample of such a game:

ColinA B

A 2 -3Rose B 0 2

C -5 10

• What do the entries mean? R has three possible strategies (A,B,C), and C has twopossible (A,B); the strategies A and B for R are not necessarily the same as for C;they are just names for each player’s actions.

• Suppose R player is the striker in soccer and C is the goalie. R can strike left, middle,or right. This particular goalie make only two moves: dive to left or dive right. Thesemoves represent their strategies.

• Each entry represents the payoff to R—due to zero-sum game, the payoff to C is thenegative of this. So, for strategy pair (A,B), R will lose $3 and C will win 3.

• We assume that both players have complete knowledge of the payoff matrix; they knoweach other’s all possible moves. But each has to decide on their move independentlyand simultaneously: they announce their moves, and the matrix tells them what thepayoff is.

3

• Consider the thinking process of these rational players.

1. Rose likes payoff of 10, so she should play C, hoping Colin will play B.

2. But Colin can anticipate that and play A, giving Rose payoff of −5.

3. Anticipating that, Rose should play A, and since Colin anticipates that, he canplay B.

4. And, so we get into this endless chain of reasoning.

• What would you advise Rose and Colin to do, and what assurance can you offerjustifying your recommendation?

• This is the crux of GT: are there choices for players that they will not regret?

2.1 Dominance and Saddle Points

• To explore the problem further, consider the following slightly larger game: Rose andColin, both with 4 strategies or actions.

ColinA B C D

A 12 -1 1 0B 5 1 7 -20

RoseC 3 2 4 3D -16 0 0 16

• Rose prefers the large payoff (e.g. 16), while Colin prefers small payoffs (-20). Butsome strategies are just not worth playing ever. For instance, should Colin ever playstrategy C? No! Because strategy B serves Colin’s interest better than C.

• Dominance Principle. We call C a dominated strategy for Colin, and a rationalplayer should never play a dominated strategy.

• We can therefore eliminate all dominated strategies from the matrix. While useful, thisby itself does not help solve the problem. In fact, there are no dominated strategies inthe earlier example of Fig 1.

4

• But if we play the new 4 × 4 game for a while, we notice that Rose uses strategy Cfrequently while Colin uses B frequently. Why is that? At a superficial level, we cancall them most cautious strategies:

1. Rose is assured of winning at least 2, while in other strategies she can even losesome times.

2. Colin is assured of losing the least (namely, 2), while in others his losses can bemuch larger.

• But the important point is that Rose-C and Colin-B in fact are very good strategieseven if Rose and Colin are not cautious players. There are two justifications for this:

1. (Rose C, Colin B) is an equilibrium point. This means that as long as Colin isplaying B, Rose’s optimal strategy is C, and similarly as long as Rose plays C,Colin’s optimal strategy is to play B. This is called an equilibrium because ifboth are playing this pair of strategies, neither one can benefit by switching.

2. By playing C, Rose can ensure she will win at least 2 no matter what Colin plays.By playing B, Colin can ensure he loses at most 2 no matter what Rose does.If Rose wins < 2, she would regret not playing C, and if Rose wins > 2, Colinwould regret not playing B. Thus, by playing anything else, they each risk aworse payoff than the equilibrium.

• According to Game Theory, (Rose C, Colin B) is the rational or optimal play for thisgame. The specific structure of the solution in this example is this:

its payoff is simultaneously min in its row and max in its column.

• This means that if Rose knew Colin is playing B, row C maximizes her payoff, and ifColin knew Rose is playing C, col B maximizes his payoff. Strategies leading to thisoutcome are players’ best response to each other.

• Saddle Point: An outcome in a matrix game is called a saddle point if the entry isminimum in its row and maximum in its column. If a matrix game has a saddle point,then the optimal strategy for both players is to play the saddle point.

• Value of a Game: If in a matrix game, there is a number v s.t.

– Rose has a strategy that guarantees a payoff of ≥ v, and

– Colin has a strategy that guarantees a payoff (to Rose) of ≤ v,

then v is called the value of the game.

5

• Clearly, if a game has a saddle point, then the payoff at the saddle point is the valueof the game.

2.2 State of Game Theory Before Von Neumann

• We have been lucky in our example because not every game matrix has a saddle point.(Even if it does it need not be unique.) The following is an example of non-uniquesaddle points:

Colin

A B C D

----------------------------------- row-wise minimum

|

A | 4 2* 5 2* 2 <- max-min

|

B | 2 1 -1 -20 -10

Rose |

C | 3 2* 4 2* 2 <- max-min

|

D | -16 0 16 1 -16

------------------------------------

col max 4 2 16 2

^ ^

| | min-max

• In this case, there are 4 saddle points, and they all have equal value 2. However, notall outcomes with value 2 are saddle—for instance, (B, A) is not a saddle.

6

• In general, max-min also does not equal min-max. E.g.:

Colin

A B

---------------------- row minimum

A | 2 -3 | -3

| |

Rose B | 0 2 | 0 <- max-min

| |

C | -5 10 | -5

------------------------

col maximum 2 10

^

min-max

• When this occurs, the game does not have a saddle point. Thus, in this game, Rosecan guarantee that she wins at least 0, while Colin can ensure she does not win morethan 2, but in that range, the game is open, and unresolved.

• Also, the intersection of max-min and min-max (B,A) is not an equilibrium: Rosewould want to switch to strategy A.

• This was the landscape in which John von Neumann’s famous theorem appeared, thatlaid the foundation of GT.

7

3 Mixed Strategies and the Min-Max Theorem

• The question that von Neumann attempted to answer was this: in a game withoutsaddle points, what strategy a rational player should choose?

• It is helpful to consider one of the simplest games without a saddle point: MatchingPennies.

Colin

H T

----------------

H | 1 -1 |

| |

Rose T | -1 1 |

----------------

• On match, Rose wins 1; on mis-match, Colin wins 1. But there is no saddle in thegame. (Check!) What would you do in such a game?

• When there is no saddle point, neither player would want to use a single strategy withcertainty since the other player could then take advantage of it. The only sensiblething is to play a random mix of strategies.

• Such a plan, which involves playing a mixture of strategies according to some proba-bility distribution is called a mixed strategy.

• Formally, a mixed strategy is a prob dist over strategies. By contrast, playing onestrategy with certainty (with prob. p = 1) is called pure strategy.

• Expected Payoff: When players play mixed strategies, their payoffs are randomvariables, and so we need to evaluate their expected payoffs. Suppose the prob ofgetting payoff ai is pi, for i = 1, 2, . . . , k, then the expected payoff with respect to theseprobabilities is the weighted sum of the payoffs:

∑i

ai × pi

• Von Neumann was able to prove that every two person zero sum game has a mixedstrategy equilibrium: namely, a pair of (mixed) strategies that are optimal for the twoplayers.

8

3.1 The Min-Max Theorem for General 2-person-0-sum Games

• In general, the game matrix is a m×n matrix, where Rose has m strategies and Colinhas n strategies. The main result of von Neumann and Morgenstern is the following.

• The MiniMax Theorem. Every m × n matrix game has a unique number v, thevalue of the game, and optimal strategies (pure or mixed) such that

1. if Rose player her optimal strategy, her expected payoff is ≥ v, no matter whatColin does.

2. if Collin player his optimal strategy, Rose’s expected payoff is ≤ v, no matterwhat Rose does.

3.2 Analyzing the Two Person, Two Strategies Matrix

• In order to get insight into what these optimal mixed strategies are, suppose for aminute we know what (mixed) strategy Colin is using. Say, Colin is playing (0.5, 0.5):equal prob for his two strategies.

Colin

A B

--------------------

A | 2 -3 |

| |

Rose B | 0 3 |

--------------------

• How should Rose play? Rose can calculate the expected payoffs for its strategies:

payoff(A) = 2× 1

2− 3× 1

2= − 1

2

payoff(B) = 0× 1

2+ 3× 1

2= +

3

2

So, clearly Rose should play B.

• This reasoning is called the expected value principle: If you know your opponent isplaying a given mixed strategy, then you should play your strategy with the largestexpected payoff.

9

• Question: Keeping Colin’s strategy fixed, can Rose do better by randomizing?

• Now, consider the situation from Colin’s view. He knows that mixed strategy (0.5, 0.5)can be exploited by Rose to get a payoff of 3/2. So, perhaps he should try a differentcombination. But, for any of his mixed strategies, Rose would like to choose herstrategy that gives her the best payoff. So, the best Colin can hope for is to choose astrategy that cannot be exploited by Rose—-i.e. one for which Rose’s expected payoffsare the same for all her moves.

• So, suppose Colin plays A with prob x and B with prob (1− x), for x ∈ [0, 1]. Rose’sexpected payoffs are:

Rose(A) = 2x− 3(1− x) = 5x− 3

Rose(B) = 0x+ 3(1− x) = − 3x+ 3

By setting them equal, we get 8x = 6, which gives x = 3/4. Thus, if Colin plays(3/4, 1/4), Rose’s expected payoffs for both A and B are 3/4.

• Now, consider the same game from Rose’s perspective. She also knows that for any ofher mixed strategy, Colin will try to choose a column that minimizes her payoff. So,the best she can hope for is to neutralize this threat, by choosing a mixed strategywhere all actions of Colin are equalized.

• Suppose she plays the mixed strategy (y, 1− y), for some y ∈ [0, 1]. Then,

Colin(A) = 2y + 0(1− y) = 2y

Colin(B) = − 3y + 3(1− y) = 3− 6y

By setting them equal, we get 8y = 3, which implies y = 3/8.

• So, if Rose plays (3/8, 5/8), no matter which strategy Colin plays, the payoff to Roseis at least:

2y = 3− 6y = 3/4

• Solution of the Game: Do you notice some similarity between this solution and thesaddle point! Game theory would then prescribe that:

1. 34

is the value of this game

2. (34A, 1

4B) as Colin’s optimal strategy

3. (38A, 5

8B) as Rose’s optimal strategy

4. Together these two strategies are called the solution of the game.

10

4 Nash Equilibrium and Non-Zero-Sum Games

• The result of von Neumann and Morgenstern only applied to zero-sum games. A farmore general result was established by John Nash in his 1951 article “Non-CooperativeGames” who showed that any game with a finite set of actions has at least one (mixed-strategy) equilibrium.

• In his honor, these are called Nash equilibria; Nash received the 1994 Nobel Prize, andwas subject of the best selling book “A Beautiful Mind” and movie of the same name,starring Russell Crowe.

• Nash equilibria has been widely for:

1. analyzing hostile situations such as arms race

2. adoption of technical standards

3. traffic flow (Wardrop’s principle), auction theory, regulatory legislation, naturalresource management

4. occurrence of bank runs and currency crises

5. conflict mitigation by repeated interaction

6. to what extent people with different preferences can cooperate

7. analyzing marketing strategies, energy systems, transportation systems, evacua-tion problems

8. even penalty kicks in football

• Some toy examples of non-zero-sum games.

1. The coordination game is a symmetric 2-player games, each with two strategiesA and B. If the two players adopts the same strategy (coordination), then eachreceives the highest payoff of 4; if both adopt strategy B, which is also a Nashequilibria, each receives smaller payoff, but still neither had an incentive to uni-laterally switch—such a change reduces his payoff from 2 to 1.

P1 adopts A (4, 4) (1, 3)P1 adopts B (3, 1) (2, 2)

2. The coordination game also studied under various other settings, such as stag-hunt or technology adoption.

• According to Nobel laureate Roger Myerson: “Nash equilibria has had a fundamentaland pervasive impact in economics and social science which is comparable to that ofthe discovery of the DNA in biological sciences.”

11

4.1 Prisoner’s Dilemma

• Informally, a strategy profile is a Nash equilibrium if no player can do better by uni-laterally changing his or her strategy.

• To see what this means, imagine that each player is told the strategies of the others.Suppose then that each player asks themselves: “Knowing the strategies of the otherplayers, and treating the strategies of the other players as set in stone, can I benefitby changing my strategy?”

– If any player’s answer is Yes, then that set of strategies is not a Nash equilibrium.

– But if every player prefers not to switch (or is indifferent between switching andnot) then the strategy profile is a Nash equilibrium.

– Thus, each strategy in a Nash equilibrium is a best response to all others’ strate-gies.

• It may seem paradoxical but the equilibrium strategy—where no player is willing tochange—does not always lead to the best outcome for the players. (Nash equilibriumis not necessarily Pareto optimal.)

• This is illustrated by the game theory’s most famous example: Prisoner’s dilemma.

Two prisoners are held in separate cells, interrogated simultaneously, andoffered deals (lighter jail sentences) for betraying their fellow criminal. Theycan “cooperate” by not snitching, or “defect” by betraying the other. Butthere is a catch: if both players defect, then they both serve a longer sentencethan if neither said anything. Lower jail sentences are interpreted as higherpayoffs.

Cooperate (-1, -1) (-3, 0)Defect (0, -3) (-2, -2)

• The prisoner’s dilemma matrix is similar to coordination game, but the max rewardfor each player (in this case, a minimum loss of 0) is obtained only when their decisionsare different. Each player improves his payoff by switching from cooperate to defect.This game has a single Nash equilibria” both players choose to defect.

• In fact, the defect-defect is not only an equilibrium strategy, it is a dominant strategy:given any strategy of the other prisoner, you are better of defecting.

12

• This has emerged as one of the major insights of game theory: the equilibrium for aset of players, all acting rationally in their own interest, may not be the outcome thatis best for the players.

• Algorithmic Game Theory, in keeping with the principles of computer science, takesthis insight and attempts to quantify it using a measure called price of anarchy.

4.2 Algorithmic Game Theory: Price of Anarchy

• The price of anarchy measures the gap between the quality of a coordinated solutionand a competitive (Nash equilibrium) solution. In the prisoner’s dilemma, this gap iseffectively infinite. But in many problems, one can show that the price of anarchy isnot so bad.

• Let us consider the problem of network traffic or transportation, using the followingsimple example.

• Suppose there are some number of cars traveling from s and t. What is the expecteddistribution of traffic? We normalize the total traffic as 1, and then measure thelatency (travel times) along each link as a function of the fraction x of the traffic,where 0 ≤ x ≤ 1. In our network, two edges have a fixed latency of 1, one has latency0, and two have latency function f(e) = x, meaning their latency grows linearly withthe amount of traffic on it.

13

1. The traffic flow can be modeled as a “game” where every traveler has a choice of3 strategies (possible routes): (s, v, t), (s, w, t), (s, v, w, t), and the payoff of eachstrategy is the total latency of each route.

2. Suppose the traffic split evenly along the two routes (s, v, t) and (s, w, t). In thatcase, the total latency is

2× 0.5× (0.5 + 1) = 1.5

3. However, the Nash equilibrium for this game sends 100% of the traffic along theroute (s, v, w, t), whose total latency is

1× (1 + 0 + 1) = 2

That is because as long as the flow on edge (s, v) is any value x < 1, any car trav-eling along the edge (s, w) has an incentive to switch from (s, w, t) to (s, u, v, t),and reduce its latency from 1 + x to 2x < 1 + x.

• This example shows that the price of anarchy for the traffic game is at least 21.5

= 4/3.

• Remarkably, it is known that this is the maximum value for the price of anarchy inany network traffic game! (That is, given any arbitrary network with linear latencyfunctions, the price of anarchy is less than 4/3.)

• The price of anarchy bounds have deep implications for urban planning and networkinfrastructures. Selfish routing ’s low price of anarchy may explain, for instance, whythe Internet works as well as it does without any central administration: even if coor-dination were possible, the improvement would be small.

• The low price of anarchy in traffic networks can be interpreted both as good and badnews: the good news is that people driving around without any coordination manageto get pretty close to optimal, with only 33% slowdown. Interpreted another way, itcan be bad news for any utopian hopes that self-driving autonomous cars will suddenlyimprove the traffic significantly.

4.3 Braess Paradox

• The network used in our price of anarchy example also shows a famous paradox intransportation.

• Suppose we remove the link (v, w) from the network.

14

• The Nash equilibrium for this network sends half the traffic along the top path, andother the other half along the bottom half, for a total latency of 2.

• That is, the total latency of the network improves by removing links! Or conversely,the latency of a road network can become worse by adding more roads!

• This is known as Braess paradox.

15

5 Stability & Nash Equilibria: A Network Sharing Game

• We now consider another resource sharing problem in computer science within the GTframework.

• A number of users want to establish a connection to a common source node s in anetwork. For instance, each user is interested in a data stream generated at s, in aone-to-many form of communication called multicast.

• We represent the underlying network as a directed graph G = (V,E), with a cost ce ≥ 0on each edge. There is a designated source node s ∈ V , and a collection of k terminal(user) nodes t1, t2, . . . , tk ∈ V . (For simplicity, we equate nodes with their agents.)

• Each agent tj wants to form a path Pj from s to tj using as little cost as possible.

• Without any cost sharing and interaction among agents, each agent will simply usethe shortest s–tj path as Pj, using cost of edges as lengths.

• What makes the problem interesting is the prospect of agents sharing the cost of edges.

1. For each edge, an agent only pays its “fair share” of that edge’s cost, namely, cedivided by the number of paths using it.

2. Thus, there is an incentive for agents to choose paths that overlap so that theycan benefit by splitting the costs of edges.

3. (Such a sharing model is appropriate when use by multiple agents does not degradethe quality of data transmission on that edge.)

• The interaction among the users can be modeled as a game where the strategy spaceof each agent tj is set of all s to tj paths in G.

• In the following, we study whether each agent acting in its self-interest can ultimatelyreach a stable (NE) solution, and what it’s total cost is compared to the optimum.

5.1 Best Response Dynamics and Nash Equilibria

• Best Response Dynamics: We are interested in the dynamic process where eachagent continually improves its solution in response to changes made by the other agent:we refer to this process as best-response dynamics.

• Stable Solutions: A stable solution is one where the best response of each agent isto stay put, namely, a Nash equilibrium (NE).

16

• To see how the option of sharing affects agents’ behavior, consider the two examplenetworks below.

• In the left figure, each agent has two choices for a path: (1) a shared path through v or(2) an unshared single edge path. Assume each agent starts out with an initial path,but continuously evaluates the current situation to decide whether to switch or not.

• Suppose agents start out using outer paths. Then, t1 sees no advantage in switching tothe shared path, but t2 does. So, t2 updates its path to the middle one, and once thathappens, suddenly it is better for t1 to also switch, since its shared cost 2.5 + 1 beatssolo cost of 4. Once in this situation, neither t1 nor t2 has any incentive to switch, andthis is a stable outcome.

• The second example illustrates the possibility of multiple NE. In this example, thereare k agents, all resident at a common node t = t1 = t2 · · · = tk. There are two paralleledges between s and t of different costs.

– The solution in which all agents use the left edge is a NE, in which edge agentpays (1 + ε)/k.

– The solution where all agents use the right edge and pay k/k = 1 is also a NE.

• The fact that the second solution where each agent pays a lot more is also NE exposesan important and subtle point about best-response dynamics.

17

– If the agents could somehow synchronously agree to move from right to left edge,they will all be better off.

– But under best-response dynamics, each agent is only evaluating the consequencesof a unilateral move by itself, and therefore has no incentive to switch.

• In effect, an agent is not able to make any assumptions about future actions of otheragents.

– This assumption is not unrealistic. In fact, in an Internet setting, an agent maynot even know anything about other agents—and so it can only perform thoseupdates that lead to an immediate improvement for itself.

• Social Optimum vs. Stable. The sense in which one NE is better than other canbe quantified using the notion of social optimum.

A solution is a social optimum if it minimizes the total cost to all the agents.

• We can think of social optimum as a solution imposed by a benevolent central authoritythat viewed all agents as equally important and evaluated the quality of a solution byadding up their individual costs.

• In the examples above, both in (a) and (b), there is a social optimum that is also NE.But in (b) there is a second NE that is not a social optimum (in fact, has a much largercost).

• Now consider the example in figure below. If both agents start out on the middle path,then t1’s best-response move is to switch to the left path, which improves its cost, butin the process increases the cost of t2 from 3.5 to 6! Once that happens, t2 will alsomove to the right path, giving a NE. This is the only stable solution, and it is not thesocial optimum.

18

• There are examples where the cost-increasing effects of best-response dynamics can bemuch worse. One such construction is shown in figure below, where we have k agents,and each has the option to take a common outer path of cost 1 + ε, or to take theirown alternate path. The alternate path for tj has cost 1/j.

• Now suppose we start with a solution with all agents sharing the outer path. Eachagent pays 1/k fraction of cost, and this is the social optimum with minimum totalcost (1 + ε).

• But even starting with this optimum solution, things begin to unwind rapidly underbest-response dynamics. First, tk switches to its alternate path since 1/k < (1 + ε)/k.

• Once that happens, we now have only k− 1 agents sharing the cost of the outer path,and so tk−1 switches because 1/(k − 1) < (1 + ε)/(k − 1). After that, tk−2 switches,

19

and so on until all agents are using the alternate paths. At this points, things come toa stop because:

The solution in which each agent is using its direct path is a Nash equilibrium,and moreover this is the unique NE for this problem instance.

• These examples suggest that one cannot really view the social optimum as analog of theglobal minimum in a traditional search. In standard local search, the global minimumis always a stable solution where as in best response dynamics the social optimum canbe unstable.

5.2 Price of Stability

1. With this background, let us focus on two key questions related to best-response dy-namics.

2. Existence of NE: We don’t have a proof yet that there even exists a NE solution inevery instance of our multicast problem.

• How do we argue that the best-response dynamics would even terminate?

• Why couldn’t we get into an infinite cycle, where agent t1 improves its solutionat the expense of t2, and then t2 improves its solution at the expense of t1, andso on forever?

• So, at the very least, we want to argue that the best-response dynamics leads toa NE, and what is special about our routing problem to make it happen.

3. The Price of Stability: So far we have considered NE in the role of “observers”:we turn the agents loose on the graph from an arbitrary starting point and watchwhat they do. But if we view this as a “protocol designer’s” point of view, who wantsto design a process by which agents could construct paths from s, we might want topursue the following approach. We could propose a collection of paths, one for eachagent, with two properties:

• The set of paths forms a Nash equilibrium solution, and

• Subject to above, the total cost to all agents is as small as possible.

Of course, ideally we prefer the solution with the smallest cost, but if the social op-timum is not a NE, then it won’t be stable: agents will begin to deviate to improvetheir cost, unraveling the whole solution. These two properties represent our protocol’sattempt to optimize in the face of stability, finding the best solution from which agentswill not deviate.

20

4. Price of Stability:) Given a problem instance, its price of stability is defined as theratio between the cost of the best NE to the cost of the social optimum. This quantityreflects the blowup in cost we have to incur due to the requirement that our solutionmust be stable in the face of agents’ self-interest.

5. In the following, we answer these questions for the multicast problem.

5.3 Nash Equilibrium of Multicast

• The best response dynamics in the multicast network problem always terminates witha NE.

• The key to the proof is finding a right way to measure “progress:” that is, we need toshow that after each best response move, some quantity decreases, and therefore theprocess must terminate.

• Unfortunately, as the previos examples have shown, the total cost of all the agents doesnot necessarily decrease in each step, so that quantity is not useful as a measure ofprogress.

• To get some insight, let us consider a simple example. Suppose agent tj is currentlysharing with x other agents a path consisting of a single edge e. Now suppose tj decidesto switch to a path consisting of a single edge f , which no agent is currently using. Inorder for this to be the case, we must have that cf < ce/(x+ 1).

• As a result of this switch, the cost to all other agents goes up by cf : before the switch,x+1 contributed to the cost ce, and no one was incurring the cost cf . After the switch,x agents collectively pay ce and tj pays cf , for a net increase of cf .

• In order to view this as “progress” we need to offset the added cost cf via some notionthat the “potential energy” of the solution must have dropped by the amount ce/(x+1).If so, then we can consider the move by tj as progress since the total potential willdrop because cf < ce/(x+ 1).

• We do this being defining a “potential” on each edge e, with the property that itdrops by amount ce/(x+ 1) when the number of users on e decreases from x+ 1 to x.Correspondingly, it will increase by this amount when number of users increases fromx to x+ 1.

• Therefore, we want a potential so that it decreases by ce/x when the first of x agentsusing e drops; it then decreases by ce/(x− 1) when the second user drops, and so on.Setting the potential to ce(

1x

+ 1x−1 + · · ·+ 1

2+1) = ceH(x) is a simple way to accomplish

this, where H(x) is the harmonic number.

21

• More concretely, we define the potential of a set of paths P1, P2, . . . , Pk, denotedΦ(P1, P2, . . . , Pk), as follows. For each edge e, let xe be the number of paths usinge. Then, with the convention that H(0) = 0,

Φ(P1, P2, . . . , Pk) =∑e∈E

ceH(xe)

• We now show that Φ really works as a measure of progress:

Suppose the current set of paths is P1, P2, . . . , Pk, and the agent updates itspath from Pj to P ′j, causing the old potential Φ to change to Φ′. Then,Φ′ < Φ.

• [Proof.]

1. Before tj switched, it was paying∑

e∈Pjce/xe, since it was sharing the cost of each

edge e with xe − 1 others.

2. After the switch, it continues to pay this cost for edges in the intersection Pj ∩P ′j ,but it also pays cf/(xf + 1) on each edge f ∈ P ′j − Pj.

3. The fact that tj viewed this switch as an improvement, we have

∑f∈P ′

j−Pj

cfxf + 1

<∑

e∈Pj−P ′j

cexe

4. Let us now consider what happens to the potential function Φ. The only edgeson which it changes are those in P ′j − Pj or those in Pj − P ′j .

5. On the former set, it increases by

∑f∈P ′

j−Pj

cf [H(xf + 1)−H(xf )] =∑

f∈P ′j−Pj

cfxf + 1

,

and on the latter set, it decreases by

∑e∈Pj−P ′

j

ce[H(xe)−H(xe − 1)] =∑

e∈Pj−P ′j

cexe.

6. So, the criterion that tj used for switching paths is precisely that the total increaseis strictly less than total decrease, and hence the potential Φ decreases as a resultof tj’s switch.

22

7. This completes the proof.

• Now, since there are only a finite number of ways to choose a pat for each agent tj, andthe previous potential-decreasing lemma guarantees that the best-response dynamicscan never revisit a set of paths P1, P2, . . . , Pk once it leaves it, we have proved thefollowing:

Best response dynamics always leads to a set of paths that forms a Nashequilibrium.

• With a little more work, one can also show that the price of stability for the multicastproblem is Θ(H(k)). The earlier example gives the lower bound, and a potentialfunction based argument used above can be extended to derive an upper bound.

23

6 Stable Matching

• Stable Matching problem originated in 1962 when David Gale and Lloyd Shapley, twomathematical economists, asked the following question:

Could one design a college admission, medical internship, or job recruitingprocess that was self-enforcing?

• What does that mean? Think of college admissions. In senior year of high school,students apply to many colleges.

1. Each student has a clear preference order among colleges: he/she prefers somecollege more than others.

2. Similarly, colleges rank the applicants, preferring some to others.

3. However, neither side really knows who else is offering who admission.

4. A student S may prefer college C but if C has not made him an offer yet, he mayend up accepting college D, only to learn later than C offers him admission, andat which point C finds out that its top student chose some other college.

• This can, and does, become quite chaotic. In fact, the medical schools began offeringearlier and earlier admissions, and asking for pre-commitments. Finally, the processdegenerated to the point that some colleges were reaching back to sophomores! Thislead to the creation of National Resident Matching Program.

• What Gale-Shapley asked was: is there a matching in which no “pair has a regret”.That is, no student or college wishes to retract its offer.

• Lloyd Shapley and Alvin Roth received 2012 Nobel Prize in Economics, for “theory ofstable allocations and practice of market design.” (David Gale had died in 2008.)

6.1 Formalizing the Problem

• The problem abstracted as one of arranging “marriage” in a society of nmen (A,B, . . . , Z)and n women (a, b, . . . , z).

• Each man (woman) ranks all women (man), in descending order of preference.

• An example for n = 3, with the preference lists:

– A : c > a > b

– B : c > b > a

24

– C : a > b > c

– a : C > B > A

– b : B > A > C

– c : B > C > A

• A matching is a 1-to-1 correspondence (monogamous, heterosexual marriage).

• A pair (m,w) is unstable if m and w like each other more than their assigned partners.

• A matching is called unstable if it has a unstable pair (e.g. risk of elopement).

• Does a stable matching always exist, for any set of input preferences?

• In our example, the matching {(A, c), (B, b), (C, a)} has an unstable pair (B, c).

• On the other hand, the matching {(A, b), (B, c), (C, a)} is stable.

6.2 Gale Shapley (GS) Algorithm

• Gale-Shapley showed that a stable matching always exists for any set of preferences.Constructive proof: they gave an algorithm to find a stable matching.

• GS algorithm works on the principle of Man proposes, woman disposes.

• Each unattached man proposes to the highest-ranked woman in his list who has notalready rejected him.

• If the man proposing to her is better than her current mate, the woman dumps hercurrent partner, and becomes engaged to the new proposer.

• Since no man proposes to the same woman twice, the algorithm terminates, and theproof will show that the result is a stable matching.

25

• In particular, the algorithm has the following high-level description.

1. LIST : list of unattached men.

2. cur(m): highest ranked woman in m’s list who has not rejected him.

3. Initialize LIST = {1, 2, . . . , n} and cur(i) = M(i, 1).

4. Choose a man, say, Bob, from LIST . Bob proposes to Alice = cur(Bob).

5. If Alice unattached, Bob and Alice are engaged.

6. If Alice is engaged to, say, John, but prefers Bob, she dumps John, and Bob andAlice are engaged. Otherwise, she rejects Bob.

7. The rejected man rejoins LIST , and updates his cur.

8. Output the engaged pairs when LIST = ∅.

• When executed on the earlier example, it produces the matching {(A, b), (B, c), (C, a)}.

• The algorithm terminates in O(n2) steps, since each step moves one cur pointer, andthere are at most n2 preferences.

• Remains to prove the matching is always stable.

6.3 Correctness of GS

• Proof by contradiction: suppose the resulting matching has a unstable pair (Dick,Laura). That is, Dick and Laura both prefer each other to their assigned mates.

• Dick must have proposed to Laura at some point.

• During the algorithm, Laura also rejected Dick in favor of some she prefers more.

• Since no woman ever switches to a man less desirable than her current partner, Laura’scurrent partner must be more desirable than Dick.

• Thus, the pair (Dick, Laura) is not unstable.

6.4 Properties of GS Matching

• The GS matching has many interesting properties, in addition to just being stable.

• Fact 1. Any women w remains engaged from the point at which she receives her firstproposal. The sequence of partners she is engaged to improves with time.

• Fact 2. The sequence of women to whom a man m proposes gets worse and worse.

26

• If a man m is free at some point then there must be a woman he has not proposed yet.(For n women to be engaged, there must be n men engaged.)

• The matching at the end is a perfect matching. In fact, all executions of the algorithmyield the same stable matching.

• The following characterization of the matching found may seem surprising:

Let best(m) denote the best possible woman that the man m can be marriedto in any stable matching. Then, the matching found is (m, best(m)).

• This is surprising because:

1. First, there is no reason to believe that (m, best(m)) is a matching at all, muchless stable. After all, why can’t more than one man have the same best validpartner.

2. Second, it shows that the simple GS algorithm finds such a matching, meaningthere is no matching in which men could hope to do better.

3. Finally, it shows that the order of proposals has not effect on the final matching.

• Proof.

1. We prove the man-optimality of GS by contradiction.

2. Consider the first time when some man m is rejected by his best valid partnerw = best(m) in the GS matching. (Valid pairings only consider partners in stablematchings.)

3. The woman w rejects m in favor of, say, m′ who she likes more than m. Thus, inw’s preference, we have m′ > m.

4. Let us mark this instant as R (rejection), and return to it later to derive a con-tradiction.

5. By the hypothesis that w = best(m), there exists a stable matching S ′ in whichm and w are paired.

6. In this matching S ′, the man m′ is paired with someone else, say, w′ 6= w. Thus,S ′ contains stable pairs (m,w) and (m′, w′).

7. We now consider what the execution of GS tells us about the preferences of m′

between w and w′.

8. Since R was the first event in GS where any man was rejected by his best validpartner, at the instant R the following must be true:

27

(a) m′ has not been rejected by his best(m′), and

(b) m′ has been rejected by every woman in his list that comes before w, sincehe is paired with w at instant R.

9. These two facts together imply that w > w′ in the preference list of m′. First,best(m′) must come after w because former hasn’t rejected m′ yet, so w >best(m′). Second, w′ is also a valid partner of m′ because S ′ is a stable matching,and so best(m′) > w′. Together, we get w > w′.

10. But that means (m′, w) is an unstable pair for S ′: both m′ and w prefer eachother to their assigned partners in S ′.

11. Thus, our initial assumption that m was rejected by best(m) is false.

• One can also show that in stable matching GS, each woman is paired with the worstpossible man.

28

game theory and strategic behavior - sites.cs.ucsb.edusuri/ccs130a/gametheory.pdf · game theory...

Documents