apresentaรงรฃo do powerpointrafael/cursos/1s2017/...formalization and modeling definition 2.3...
TRANSCRIPT
Best-Response Mechanisms
Authors: Noam Nisan et al Innovations in Computer Science 2011
Sergio Zumpano Arnosti MSc Student
MO829 โ Algorithmic Game Theory
Introduction & Motivation
The setting
Games with incentive-compatible best-response
mechanisms
Formalization & Modeling
Examples
Conclusions
References
Summary
2 / 40
Game theory and economics
Often abstract how equilibrium is reached.
โlocally rationalโ actions -> mysteriously the
system reach a global equilibrium.
e.g.: repeated best-response dynamics.
Introduction & Motivation
3 / 40
Introduction & Motivation
Games with uncoupled incomplete information
+
Repeated best-respond
When is it best to best-respond?
Is it rational for players to repeatedly best-respond ?
Can a long-sighted player improve, in the long run,
over this repeated myopic optimization?
4 / 40
Attractive trait
To best-respond each player need only to know his own utility
function, as his best response does not depend on other
playersโ utility function, but only on their actions.
Best-response dynamics -> natural protocol
Gradual and limited sharing of information is an effort to
reach an equilibrium.
e.g., Internet routing
Introduction & Motivation
5 / 40
Base game
โข ๐-player (1,โฆ , ๐) base game ๐บ
โข Player ๐
Strategy space ๐๐ and ๐ = ๐1 ร โฆ ร ๐๐
Utility function ๐ข๐ ๐ ๐ข๐โ ๐กโ๐๐ก ๐ข1, โฆ , ๐ข๐ โ ๐ โ ๐1 ร . .ร ๐๐
Only knows his own utility function (private)
โข All playersโ utility functions -> full-information base game
โข Desire that the outcome be an equilibrium
The setting
6 / 40
Best-response mechanisms
โข Players take turns selecting strategies
โข At each discrete time step ๐ก some player ๐๐ก selects and
announces strategy ๐ ๐๐ก โ ๐๐๐ก
โข Choose a best-response to others announced strategies
โข Fully-specification
โข (1) the starting state
โข (2) order of player activations
โข (3) rule for breaking ties among multiple best responses
The setting
7 / 40
Goal
โข Identify interesting classes of (base) games for which
best-response mechanisms are incentive-compatible
โข When all other players are repeatedly best-reponding, then
a player is incentivized to do the same.
โข Consider games in which repeated best-response dynamics
do converge to an equilibrium.
The setting
8 / 40
Tie-breaking rules
โข When exists multiple best-responses
โข Tie-breaking rule must be โuncoupledโ depend only on the
player private information
โข For each player ๐
โข Fix an a-priori full order โบ๐ ๐๐ ๐๐
โข Instruct him to break ties between multiple best-
responses according to โบ๐
The setting
9 / 40
Games with incentive-compatible best-response mechanisms
C D
A 2, 1 0, 0
B 3, 0 1, 2
โข * Unique PNE -> (B, D) โข Best-response dynamics are guaranteed to converge to * implies the incentive-compatibility of best-resp. mechanisms
10 / 40
Games with incentive-compatible best-response mechanisms
C D
A 2, 1 0, 0
B 3, 0 1, 2
โข * Unique PNE -> (B, D) โข Best-response dynamics are guaranteed to converge to * implies the incentive-compatibility of best-resp. mechanisms
FALSE
10 / 40
Games with incentive-compatible best-response mechanisms
C D
A 2, 1 0, 0
B 3, 0 1, 2
โข * Unique PNE -> (B, D) โข Best-response dynamics are guaranteed to converge to * implies the incentive-compatibility of best-resp. mechanisms
FALSE
Repeated best-responding is not incentive compatible
in this game
10 / 40
Games with incentive-compatible best-response mechanisms
What traits must a game have for best-response dynamics to be incentive compatible?
11 / 40
Games with incentive-compatible best-response mechanisms
Never-Best-Response-Solvable (NBR-solvable) games with clear outcome
Strategies are iteratively eliminated if a best-response
never leads to them. Clear outcome: each player ๐ considers the game after the
other player have already eliminated strategies that can be eliminated regardless of what ๐ does. He will not be able to
do better than the outcome.
11 / 40
Games with incentive-compatible best-response mechanisms
Theorem (informal)
Let ๐บ be an NBR-solvable game with clear outcome. Then, for every starting point and every (finite or infinite) order of player activations with at least ๐ = ๐๐ โ ๐๐ โroundsโ it holds that: 1. Repeated best-response dynamics converges to a pure
Nash equilibrium ๐ โ ๐๐ ๐บ 2. Repeated best-response dynamics is incentive
compatible
12 / 40
Formalization and modeling
Definition 2.1 (tie-breaking rules/order)
Is a full order โบ๐ ๐๐ ๐๐ Multiple best-resposes: player ๐ should choose the highest (under โบ๐ ) best-response.
Definition 2.2 (never-best-response strategies)
๐ ๐ โ ๐๐ is a NBR under tie-breaking order โบ๐ ๐๐ ๐๐ if for all ๐ โ๐ there exists ๐ ๐
โฒ so that:
๐ข๐ ๐ ๐ , ๐ โ๐ < ๐ข๐ ๐ ๐โฒ, ๐ โ๐
OR both ๐ข๐ ๐ ๐ , ๐ โ๐ = ๐ข๐ ๐ ๐
โฒ, ๐ โ๐ ๐๐๐ ๐ ๐ โบ ๐ ๐โฒ
13 / 40
Formalization and modeling
Definition 2.3 (NBR-solvable games)
A game ๐บ is never-best-response-solvable under tie-breaking rules โบ1, โฆ , โบ๐ if there exists a sequence of eliminations of NBR strategies that results in a single strategy profile.
Definition 2.4 (shortest-elimination parameters)
Let ๐บ be an NBR-solvable game โข Exists ๐บ0, โฆ , ๐บ๐ โข ๐บ = ๐บ0 , in ๐บ๐ each player has only a single strategy and โi โ 0,โฆ , ๐ โ 1 , ๐บ๐+1 ๐๐ ๐๐๐ก๐๐๐๐๐ ๐๐๐๐ ๐บ๐ via removal of sets of NBR strategies. โข ๐๐บ: length of shortest sequence of games for ๐บ.
14 / 40
Formalization and modeling
Definition 2.5 (globally-optimal profiles)
๐ โ ๐ is globally optimal for ๐ if โ๐ก โ ๐, ๐ข๐ ๐ก < ๐ข๐(๐ ).
Definition 2.6 (clear outcomes)
โข Let ๐บ be an NBR-solvable game under โบ1, โฆ ,โบ๐. โข Let ๐ โ be the unique PNE under tie-breaking of ๐บ. โข ๐บ has a clear outcome if for every player ๐ there exists
an order of elimination of NBR strategies such that ๐ โ is globally optimal for ๐ at the first step in the elimination sequence.
โข The game obtained after the removal of all previously-eliminated strategies from ๐บ.
15 / 40
Formalization and modeling
Theorem 2.7 (incentive-compatible mechanisms)
โข Let ๐บ be an NBR-solvable game with a clear outcome ๐ โ โ ๐ under tie-breaking rules โบ1, โฆ , โบ๐.
โข Let ๐ be a best-response mechanism for ๐บ with at least ๐ = ๐๐บ "๐๐๐ข๐๐๐ โ.
1. ๐ converges to ๐ โ 2. ๐ is incentive compatible.
16 / 40
Formalization and modeling
Proof sketch (convergence):
โข ๐บ: NBR-solvable
โข ๐บ0, โฆ , ๐บ๐ , ๐บ = ๐บ0 ๐๐๐ โ๐ โ 0, โฆ , ๐ โ 1 , ๐บ๐+1 ๐๐ ๐๐๐ก๐๐๐๐๐
๐๐๐๐ ๐บ๐ ๐ฃ๐๐ ๐๐๐๐๐ฃ๐๐ ๐๐ ๐๐ต๐ ๐ ๐ก๐๐๐ก๐๐๐๐๐ .
โข Consider the first round of a best-response mechanism.
โข Consider ๐ โ ๐ , โ ๐ ๐ โ ๐๐ that is NBR in ๐บ = ๐บ0.
โข Once j is activated, ๐ ๐ will never be selected thereafter. After the first round,
no NBR strategy in ๐บ0 will be played ever again and hence the game is
effectively equivalent to ๐บ1.
โข Same argument for the next rounds, mimic the elimination sequence in each
strategy until reach ๐บ๐ , whose unique strategy tuple ๐ โ is the unique PNE
under tie-breaking of ๐บ. 17 / 40
Formalization and modeling
Proof sketch (incentive compatibility):
โข ๐ฟ๐๐ก ๐ be a player that deviates from repeated best-response and strictly gains
from doing so.
โข ๐บ is NBR-solvable โ โ a (player-specific) order of elimination of NBR
strategies such that ๐ โ is globally optimal for ๐ at the first step of elimination
sequence (the game obtained after the removal of all previously-eliminated
strategies from ๐บ).
โข ๐บ0, โฆ , ๐บ๐ , ๐บ = ๐บ0 ๐๐๐ โ๐ โ 0, โฆ , ๐ โ 1 , ๐บ๐+1 ๐๐ ๐๐๐ก๐๐๐๐๐ ๐๐๐๐ ๐บ๐ ๐ฃ๐๐
๐๐๐๐๐ฃ๐๐ ๐๐ ๐๐ต๐ ๐ ๐ก๐๐๐ก๐๐๐๐๐ (under tie-breaking).
โข Let ๐ก๐ be the index of the first game in sequence in which ๐โฒ๐ strategies are
eliminated in that order.
18 / 40
Formalization and modeling
Proof sketch (incentive compatibility):
โข All player but ๐ are repeatedly best-responding and in the ๐ก๐ โ 1 first steps of
the elimination sequence no strategy in ๐๐ is removed.
โข The same arguments for convergence can be used to show that after ๐ก๐ โ 1
rounds the game is effectively equivalent to ๐บ๐ก๐ , regardless of the actions of
player ๐.
โข However, in that game, ๐ can do no better than ๐ โ.
19 / 40
Formalization and modeling
Proof sketch (incentive compatibility):
โข All player but ๐ are repeatedly best-responding and in the ๐ก๐ โ 1 first steps of
the elimination sequence no strategy in ๐๐ is removed.
โข The same arguments for convergence can be used to show that after ๐ก๐ โ 1
rounds the game is effectively equivalent to ๐บ๐ก๐ , regardless of the actions of
player ๐.
โข However, in that game, ๐ can do no better than ๐ โ.
A CONTRADICTION
19 / 40
Examples
Game Description
Stable-roommates
Students must be paired for the purpose of sharing dorm rooms. The objective is to find a โstable matchingโ.
Cost-sharing
Cost of some public service must be distributed between self-interested users.
Internet routing
BGP establishes routes between the smaller networks. Abstract and prove that BGP is incentive compatible in realistic environments.
Congestion control
TCP handles congestion on the Internet. Increase the transmission rate until congestion and then decrease. Show that such behavior is in equilibrium.
20 / 40
Examples - Stable-roommates
๐ students 1,โฆ , ๐
Each has a private strict ranking of the others and prefers
being matched.
A stable matching is not guaranteed to exist in general and,
if a stable matching does exist, existing algorithms for
reaching it are not incentive compatible.
The authors observed environments where a stable
matching is guaranteed to exist and can be reached in an
incentive compatible manner.
21 / 40
Examples - Stable-roommates
Intern-hospital matchings
โข The โstudentsโ are divided into two disjoint sets (interns and hospitals).
โข Hospitals have the same ranking of interns.
Correlated markets
โข The โstudentsโ are vertices in a complete graph. โข Every edge has a unique โweightโ. โข The โheavierโ the edge connecting students the higher
that student ranks the other student.
Two well-known special cases:
22 / 40
Examples - Stable-roommates
Stable-roommates games
โข Players: students โข ๐๐: ๐
โฒ๐ strategy space, the set of all students ๐ โ ๐ โข ๐ผ๐(๐): ๐โฒ๐ rank in student ๐โฒ๐ ranking (lowest -> rank 1) โข โ๐ = ๐ 1, โฆ , ๐ ๐ โ ๐:
๐ข๐ ๐ = ๐ผ๐ ๐ โ ๐ ๐ = ๐ ๐๐๐ โ๐ โ ๐ ๐ ๐ข๐โ ๐กโ๐๐ก ๐ ๐ = ๐ ๐๐๐ ๐ผ๐ ๐ > ๐ผ๐(๐)
โข Otherwise: ๐ข๐ ๐ = 0
23 / 40
Examples - Stable-roommates
Theorem (stable-roommates games)
For every stable-roommates game ๐บ it holds that in both hospital-intern matchings and correlated markets
โข ๐บ is NBR-solvable โข ๐บโฒs unique PNE is a stable matchings โข ๐๐บ โค ๐
24 / 40
Examples - Stable-roommates
Proof sketch:
โข Cycle-free: if there is no sequence of roommates ๐1, ๐2, โฆ , ๐๐ of length ๐ > 2
such that each student ๐๐ ranks student ๐๐+1 higher than student ๐๐โ1 (mod ๐).
โข Cycle-free game has an elimination sequence:
โข Start with some arbitrary student ๐1
โข Construct a sequence ๐1, ๐2, โฆ in which ๐๐+1 is the sudent ๐๐ prefers
โข Number of students is finite and so the sequence must repeat. Since the
game is cycle-free, the cycle must be of length 2 โถ located two students
that desire each other the most.
โข We can eliminate for each of the two the strategies of proposing to any other
student โถ Maximal utility by proposing to each other.
25 / 40
Examples - Stable-roommates
Proof sketch:
Ramains to notice that both environments are cycle-free:
(1) Hospitals and interns
โข Any cycle of players will have to include a hospital after a desired intern
and before a less desired one.
(2) Correlated markets
โข Any cycle of nodes in the graph must include an edge with a lower weight
that appears after an edge with a higher one.
(1), (2) the preferences do not induce a cycle in the matching graph
โด The mechanism is a best-response mechanism for stable-roommates games Theorem 2.7 โถ implements incentive-compatibility
26 / 40
Examples - Internet routing
The network is an undirected graph ๐บ = (๐, ๐ธ)
๐: ๐ source nodes 1,โฆ , ๐ and a unique destination node ๐
Each has a private strict ranking of all simple (loop-free)
routes between itself and the dest. ๐.
Under BGP, each source repeatedly examines its
neighboring nodesโ most recent announcements. Forwards
through the neighbor whose route it likes the most, and
announces its newly chosen route.
BGP converges to a โstableโ tree is the subject of
networking research. 27 / 40
Examples - Internet routing
Theorem (Levin et al [2])
BGP is incentive-compatible in ex-post Nash in networks for which No Dispute Wheel holds.
28 / 40
Examples - Internet routing
BGP games
โข Players: source nodes in ๐ โข ๐๐: ๐
โฒ๐ outgoing edges in ๐ธ
โข ๐ = (๐1, โฆ , ๐๐): vector of source nodesโ traffic forwarding decisions (strategies)
โข ๐ข๐(๐ = ๐1, โฆ , ๐๐ ): ๐โฒ๐ rank for the simple route from ๐ to ๐
under ๐ (lowest -> rank 1)
โข Otherwise:
๐ข๐ ๐ = 0
29 / 40
Examples - Internet routing
Theorem (BGP games)
For a BGP game ๐บ it holds that: โข ๐บ is NBR-solvable โข ๐บโฒ๐ unique PNE is a stable routing tree โข ๐๐บ โค ๐.
30 / 40
Examples - Internet routing
Proof sketch:
โข Elimination order: locate a node that can guarantee its most preferred route
(current subgame) and eliminate all other routing actions for it.
โข Start with an arbitrary node ๐0 with at least 2 action.
โข Let ๐ 0 be ๐0โs most preferred existing route to ๐.
โข Let ๐1 be the vertex closest to ๐ on ๐ 0 , with two available actions in the
current subgame, such that ๐1 prefers some other route ๐ 1 that leads ๐1
to ๐.
โข Choose ๐2 closest to ๐, ๐ 2 ๐กโ๐๐ก ๐๐ ๐2โs most preferred.
โข Continue to choose ๐3, ๐4, โฆ (finite number of vertices).
โด We are able to find a node that can guarantee its most preferred route and
continue with the elimination, until there are no more nodes with actions. 31 / 40
Examples - Congestion control
Handled via combination of transmission-rate-adjustment
protocols at the sender-receiver level (TCP) and queueing
management policies.
TCP is notoriously not incentive compatible.
Godfrey et al [3] analyses incentives in TCP-inspired
environments.
32 / 40
Examples - Congestion control
The network is an undirected graph ๐บ = (๐, ๐ธ).
๐ ๐ : capacity function that specifies the capacity for each
edge ๐ โ ๐ธ.
๐ source-target pairs of vertices ๐ผ๐ , ๐ฝ๐ that aims to send
traffic along a fixed route ๐ ๐ ๐๐ ๐บ.
๐ผ๐ can select transmission rates in the interval 0,๐๐ .
๐๐ is ๐ผ๐ โs private information and wishes to maximize its
achieved throughput.
Congestion: sum of incoming flows exceeds edgeโs capacity,
excess traffic must be discarded. 33 / 40
Examples - Congestion control
Strict-Priority-Queuing (SPQ)
โข โ๐ โ ๐ธ there is an edge-specific order over source nodes. โข Sharing: the most highly ranked source whose route traverses
the edge gets its entire flow sent along the edge (up to ๐(๐)); ununsed capacity is allocated to the second most highly ranked.
Weighted-Fair-Queueing (WFQ)
โข โ๐ โ ๐ธ, each source node ๐ผ๐ has weight ๐ค๐(๐) at ๐.
โข Allocated capacity: ๐ค๐
ฮฃ๐๐ค๐๐(๐).
โข Special case: โ๐ โ ๐ธ, โ๐ โ ๐ ,๐ค๐ ๐ = 1 is called โfair queuingโ (FQ).
Two capacity-allocation schemes:
34 / 40
Examples - Congestion control
Godfrey et al [3] considers a TCP-like protocol called
Probing-Increase-Educate-Decrease (PIED). PIED is shown
to be incentive compatible in SPQ and WFQ.
Theorem 3.7 (Godfrey et al [3])
โข PIED is incentive compatible in networks in which all edges use SPQ with coordinated priorities.
Theorem 3.8 (Godfrey et al [3])
โข PIED is incentive compatible in networks in which all edges use WFQ with coordinated weights (and so if all edges use FQ then PIED is incentive compatible)
35 / 40
Examples - Congestion control
TCP games
โข Players: source nodes โข ๐๐ = [0,๐๐]: ๐
โฒ๐ strategy space โข ๐ = (๐1, โฆ , ๐๐): vector of source nodesโ transmission rates
(strategies) โข ๐ข๐(๐ ): is ๐ผ๐โs achieved throughput in the unique traffic-flow
equilibrium point of the network for ๐ . โข Godfrey et al [3] shows that such a unique point exists for SPQ
and WFQ. โข Tie-breaking rules:
โ๐ , ๐ก โ ๐๐ , ๐ โบ๐ ๐ก โ ๐ > ๐ก
36 / 40
Examples - Congestion control
Theorem (TCP games)
For every TCP game ๐บ such that all edges use SPQ with coortinated priorities, or all edges use WFQ with coordinated weights, it holds that:
โข ๐บ is NBR-solvable under tie-breaking rules. โข ๐บโs unique PNE under these tie-breaking is a stable flow
pattern. โข ๐๐บ โค ๐.
The proof will be only for the case of Weighted-Fair-Queuing, with equal weights.
37 / 40
Examples - Congestion control
Proof sketch:
โข For each edge ๐, the share of each flow as ๐ฝ๐ = ๐๐ ๐๐ .
โข Construct an elimination sequence:
โข Let ๐โ be the edge with the minimal ๐ฝ.
โข Each flow on this edge is guaranteed ๐ฝ๐โ traffic and at least that amount on
all other edges.
โข Therefore is possible to eliminate all actions of transmitting less than ๐ฝ๐โ .
โข If player ๐ eliminates actions below ๐ฝ๐โ last among players that go through
๐โ, then he does so in game in which the final profile is optimal for him.
โด The game has a clear outcome. Theorem 2.7 implies a result that is similar
in spirit to the two theorems of Godfrey et al [3].
38 / 40
Conclusions
It was possible to explore when such locally-rational
dynamics are also globally rational.
Results along the article give an incentive to think in new
structures of existing protocols/mechanisms and provide
new insights into the design of them.
It was interesting to see that, in some specific conditions,
real environments with repeated best-response mechanism
can be incentive compatible.
39 / 40
References
[1] N. Nisan, M. Schapira, G. Valiant, A. Zohar โBest-Response
Mechanismsโ, Innovations in Computer Science - ICS 2011, pages 155-
165, 2011.
[2] H. Levin, M. Schapira, and A. Zohar. Interdomain routing and
games. In STOC, pages 57-66, 2008.
[3] P. B. Godfrey, M. Schapira, A. Zohar, and S. Shenker. Incentive
compatibility and dynamics of congestion control. SIGMETRICS
Perform. Eval. Rev., 38(1):95-106, 2010.
40 / 40