game-theoretic recommendations: some progress in an uphill battle moshe tennenholtz...

Game-Theoretic Recommendations:

Some Progress in an Uphill Battle

Moshe Tennenholtz

Technion—Israel Institute of Technology

When CS meets GT • Game theory has become a central tool for the

analysis of multi-agent systems in many areas of computer science. .

• The question is whether game theory provides answers to the two central challenges of multi-agent systems:

1. Providing a recommendation to an agent on its course of action in a multi-agent encounter (the agent perspective)

2. Leading a group of rational agents to perform a socially desired behavior (the mediator perspective)

When CS meets GT: The agent’s challenge

• We are given a multi-agent environment.

• How should an agent choose its action?

Decision making in multi-agent systems: load balancing

Which route should an agent take?Taking the route that goes through s is α slower than taking

the route the goes through f, but service is splitted when shared among agents.

Agent 1 Agent 2

Target

f s

Decision making in multi-agent systems: the game-theoretic approach

• Agents behave according to equilibrium.

• When α >= 0.5, select f with probability (2- α)/(1+ α) is in equilibrium

Agent1/Agent2

f s

f 1/2,1/2 1, α

s α,1 α/2,α/2

Decision making in multi-agent systems: the agent perspective

• Equilibrium may be of interest from a descriptive perspective, but would you recommend your agent the strategy prescribed in a Nash equilibrium?

Decision making in multi-agent systems: a robust normative approach

• Safety level strategy: choose f with probability α/(1+α):

• The safety-level strategy does not define a Nash equilibrium, but the expected payoff of it is as in a Nash equilibrium!

Agent1/Agent2

f s

f 1/2,1/2 1, α

s α,1 α/2,α/2

The Game Theory Literature and Robust Algorithms

• One powerful attack on Nash equilibrium is in an example by Aumann:

•

• The maximin value is as a high as what we obtain in the Nash equilibria. Why should one therefore use the strategy that corresponds to Nash equilibrium?

Agent1/Agent2

c d

a 2,6 4,2

b 6,0 0,4

The Game Theory Literature and Robust Algorithms

• Our example of robust decision-making in the context of decentralized load balancing is of similar flavor of Aumann’s criticism.

• Our aim however is a normative one: our finding(s) serve as a useful approach for agent design in multi-agent systems!

• We will also extend the concept to C-competitive strategies allowing surprisingly positive results for wide settings.

• The maximin value if as a high as what we obtain in the Nash equilibria. Why should one therefore use the strategy that corresponds to Nash equilibrium?

C-competitive strategies

• Given a game G, and a set of strategies S for the agent.

• A mixed strategy t Δ(S) will be called a C-competitive safety-level strategy if the ratio between the expected payoff of the agent in a Nash equilibrium to its expected payoff in t is bounded by C.

• In most cases we refer to best Nash equilibrium. • If C is small we have a good suggestion for our

agent!

C-competitive strategies for decentralized load balancing:

Agent i

Agent j

Target

f s

.…. ….. ….……….. .…. .….Agent1 Agent n

Many agents (rather than only two) attempt to reach the target:

C-competitive strategies for decentralized load balancing: a 9/8 ratio

Theorem: There exists a 9/8-competitive safety strategy for the extended decentralized load balancing setting.

• The 9/8-competitive strategy:

choose f with probability α/(1+α)

C-competitive strategies for decentralized load balancing: many links, arbitrary speeds

• m links to the target, normalized to have speeds

1= α1 α2 …. αm > 0

• Theorem: There exists a (Σi=1 αi )(Σi=1 Πji αj)/ Πj=1αj

competitive safety strategy for the extended load

balancing setting, when we allow m (rather than only 2)

parallel communication lines, and arbitrary αi.

m m m

Decentralized load balancing: many parallel links

• The average network quality is Q=(Σi=1 αi )/m

• A network is k-regular if (Q / αm ) k

• Theorem: Given a k-regular network, there exists a k-competitive safety strategy for the extended load balancing setting, where we allow m (rather than just 2) parallel links.

m

Position Auction

Search Results

Position Auctions

Position Auctions - Model• k – #positions, n - #players n>k

• vi - player i’s valuation per-click

• j- position j’s click-through rate 1>2>>

Allocation rule – jth highest bid to jth highest position Tie breaks - fixed order priority rule

Payment schemepj(b1,…,bn) – position j’s payment under bid profile (b1,…,bn)

Quasi-linear utilities: utility for i if assigned to position j and pays qi per-click isj(vi-qi)

Outcome(b) = (allocation(b), position payment vector(b))

Some Position Auctions

• VCG pj(b)=l¸j+1b(l)(k-1-k)/j

• Self-price pj(b)=b(j)

• Next –price pj(b)=b(j+1)

A variant of next-price is what is used in practice.

C-competitive Strategies for Next-Price Position Auctions

• While there is no C-competitive strategies, for a constant C, for the complete information setting, such strategies exist in realistic incomplete information settings!

• We devise a robust strategy that competes with the best equilibrium expected payoff for any given agent valuation.

• For example, with uniform distribution on valuations, and linear click-rates, we get a 2-competitive strategy.

• For an agent with small valuations a “close to 1” competitive safety strategy is provided!

From Robust Agents to Prediction-Based Agents

• What should we do if the robust approach is not satisfactory, and there are no useful competitive safety strategies?

• We suggest the use of ML techniques to predict opponent behaviors, and contrast them with existing techniques in cognitive psychology and experimental economics.

• Surprisingly positive results are obtained, as validated in experiments with human subjects, and compared to leading approaches in cognitive psychology and experimental economics.

Action Prediction in One-Shot Games:

Players\Games G1 G2 … Gm

P1 ? Query

P2

.

. Data

.

.

.

Pn

Prediction rules

• We are trying to predict the behavior of player p in game g.

• A prediction rule maps the strategies chosen by all other players in all games and the strategies player p chose in all games but g, to a probability distribution over all strategies in game g.

Existing Approaches

• Population statistics

• Cognitive Methods (agent modelling): – Cognitive hierarchy (Camerer et. al.) – Agent types (Costa-Gomes et. al.)

Existing Approaches

• Population statistics

• Cognitive Methods (agent modelling): – Cognitive hierarchy (Camerer et. al.) – Agent behavioral types (Costa-Gomes et. al.)

• We offer: A new ML approach

Population Statistics

• The probability an action will be selected in a game is the frequency of times it has been selected by other agents in that game.

Cognitive hierarchy(Camerer et. al.)

• The cognitive hierarchy model defines players’ type by the number of reasoning levels the player uses.

• A "type 0" player does not use any level of reasoning: randomly chooses a strategy.

• Players doing one or more steps of thinking assume other players use less thinking steps.

• They have an accurate guess about the proportions of players who use fewer steps than they do.

• They best respond to the probability distribution induced by the above mentioned proportion.

Agent Behavioral Types (Costa-Gomes et. al.)

• In this model each participant's behavior is determined by one of nine decision rules or types: – Altruistic - maximize the sum of payoffs– Pessimistic – maxmin decision – Naïve – BR to uniform distribution– Optimistic – maximax decision – L2 – BR to Naive– D1 – 1 round of deletion of dominated strategies.– D2 – 2 rounds of deletion dominated strategies.– Equilibrium– Sophisticated – BR to population distribution.

A Machine Learning Approach

• In order to predict the play of the query agent, we suggest learning association rules from the test set.

• Association rules map a strategy in a known game to a strategy in the unknown game.

• Notice that what we suggest is

learning in ensemble of games.

Prediction using Association Rules

• In order to predict the play in the unknown game, association rules are learned from the training data.

• The estimation of probabilities is done using confidence levels of association rules combined with population statistics.

Boosting Technique

• We calculate for each action i and game g:

– The frequency that action i was played

– The confidence of the best applicable association rule

– The average confidence of the 10 best applicable association rules

Cgi

Agi

igF

Boosting Technique (cont.)

• Let be 1 if strategy i was played in g and 0 otherwise.

• We use linear regression to estimate constants B so that

will be as similar as possible to over all other players.

• We use this formula to estimate the probability distribution over actions in the unknown game.

Rgi

Rgi

ig

ig

ig ABCBFBB 3210

Data Sets

• The Costa-Gomes data set we used is the same data set used by Costa-Gomes et. al. in their paper.

• The Camerer data set we used is the same data set used by Camerer et. al. in their paper.

• We have conducted an experiment with 96 subjects

Evaluation Criteria

• Absolute prediction.

• MSE

• MLE

• Best response

Results (Absolute prediction)

Absolute prediction score

0

200

400

600

800

1000

1200

Our experimentCosta-GomesCamerer

Data set

Sco

re

Association rules

Baseline rule

CH rules (estimated Tau)

CH rules (Tau=1.55)

Costa-Gomes prediction

Results – MSE score

MSE score

0

2

4

6

8


Data set

Sco

re

Association rules

Baseline rule

CH rules (estimatedTau)CH rules (Tau=1.55)

Costa-Gomesprediction

Results – MLE score

MLE score

-25

-20

-15

-10

-5

0


Data set

Sco

re

Association rules

Baseline rule

CH rules (estimatedTau)CH rules (Tau=1.55)

Costa-Gomesprediction

Results – Best responseBest response score

0

200400

600800

10001200

14001600

1800


Data set

Sco

re

Association rules

Baseline rule

CH rules (estimated Tau)

CH rules (Tau=1.55)

Costa-Gomes prediction

Analysis

• Our method of using Machine Learning techniques provides better prediction than the related cognitive theories.

• This shows that our approach of assuming a relation without defining its nature can be useful in prediction.

• Furthermore, the most useful association rules can now be explained.

Some Useful Rules

9\9 0\1010\0 1\1

cooperate in Prisoner's Dilemma

10\10 0\99\0 8\8

trust in theTrust Game

70\120 150\30 80\9090\20 200\30 90\90150\120 90\150 0\120

70\70 -10\10 -100\-10 100\100

Best Sum

Prediction-Based Agents: Concluding Remarks

• Novel application of machine learning to one-shot games.

• Positive experimental results.

• Can be used as a technique for predicting behaviour in e.g. auctions, and other complex mechanisms, based on observing behavior in a set of basic games.

Conclusion: re-considering the agent’s perspective

• The agent is not hopeless! Introduced competitive safety analysis and showed it

may go quite far in non-trivial settings. Prediction-Based agents employing “learning in game

ensembles” technique can be used and outperform existing techniques in the cognitive psychology and experimental economics literature.

When CS meets GT • Game theory has become a central tool for the

analysis of multi-agent systems in many areas of computer science. .

• The question is whether game theory provides answers to the two central challenges of multi-agent systems:

1. Providing a recommendation to an agent on its course of action in a multi-agent encounter (the agent perspective)

2. Leading a group of rational agents to perform a socially desired behavior (the mediator perspective)

From an agent perspective to a mediator perspective

So far we were interested in recommendations for an agent acting in a multi-agent environment.

In many systems there exists a reliable entity, such as a router, broker, or system administrator, who may wish to lead the agents to desired behaviors. Hence, we are now interested in:

recommendations provided by a mediator.


When there is a mediator that attempts to lead rational agents to desired behavior the Nash equilibrium is a sound concept. It provides a solution that any single agent will not want to deviate from it, assuming all other agents stick to it.

• However, it does not provide an answer to two major challenges:

a. What about stability against deviations by coalitions? b. What happens if agents have “minimal rationality”, and

all that we can assume is that an agent will use a strategy if it a dominant strategy maximizing its payoff regardless of other agents’ actions?

Multi-Agent Systems: A Central Challenge

• Providing agents with strategy profiles that are stable against deviations by subsets of the agents is a most desired property.

• In game theoretic terms: the existence of strong equilibrium is a most desired property.

Unfortunately, such situations rarely exist.

Example: the Prisoners Dilemma

• In the only equilibrium both agents will defect, yielding both of them a payoff of 1.

• If both agents deviate from defection to cooperation then both of them will gain: mutual defection is not a strong equilibrium.

• More generally, in larger games, strong equilibrium requires stability against deviations by subsets of the agents.

Cooperate Defect

Cooperate 4,4 0,6

Defect 6,0 1,1

Mediators

• A mediator is a reliable entity that can interact with the agents, in order to try and lead them towards some useful/rational behavior in a game.

• A mediator can not enforce behavior.• The classical example: a mediator can flip coins

and recommend behavior to a set of agents, in a way which may improve the social surplus (correlated equilibrium and communication equilibrium).

Action Mediators

• Action mediator can play in the given game on behalf of agents that give it the right of play.

• Agents may decide to participate in the game directly.

Action Mediators: the Prisoners Dilemma

The mediator offers the agents the following protocol: • If both agents select to use the mediator’s services then the

mediator will perform cooperate on behalf of both agents. • If only one agent selects to use the mediator’s services then

the mediator will perform defect on behalf of that agent. Notice that when accepting the mediator’s services the agent is

committed to actual behavior as determined by the above protocol.

However, there is no way to enforce the agents to accept the suggested protocol, and each agent is free to cooperate or defect without using the mediator’s services.

Actions Mediators:

the Prisoners Dilemma

• The mediated game has a most desirable property: in this game there is a strong equilibrium; that is, an equilibrium which is stable against deviations by coalitions.

• In this equilibrium both agents will use the mediator services, which will lead them to a payoff of 4 each!

• We call a strong equilibrium in a mediated game: a strong mediated equilibrium. Hence, we get cooperation in the prisoners dilemma as the outcome of a strong mediated equilibrium.

Mediator Cooperate Defect

Mediator 4,4 6,0 1,1

Cooperate 0,6 4,4 0,6

Defect 1,1 6,0 1,1

The Power of Mediators

Proposition: Every 2-person game has a strong mediated equilibrium.

We show the existence of strong mediated equilibrium in a variety of other basic settings.

k-strong mediated equilibrium In a k-strong equilibrium we consider deviations by at most k agents.

The power of mediators:

Theorem: Let be a symmetric game in strategic form. Let 1 k n be an integer. If k! divides n then there exists a k-strong mediated equilibrium, leading to optimal surplus. Moreover, if k! does not divide n then there exists a symmetric game with no k-strong mediated equilibrium.

Example: any symmetric game with even number of agents possesses a 2-strong mediated equilibrium.

Routing Mediators• Although action mediators illustrate the potential power of a

mediator who can act on behalf of agents who give him the right of play, these mediators were very restricted. In general, a routing mediator possesses information about the actions taken by agents.

• Consider a router in a typical communication network. Messages submitted to the system must pass through this router.

In addition, the router can suggest a protocol to the agents who may wish to use its services also in selecting appropriate routes.

• Hence, in this situation it is most natural to assume that the router can observe selected actions of all agents, and not only of those who give it the right of play.

Routing Mediators: Main Result

Some terminology: In a (correlated) super-strong equilibrium, correlated deviations by any subset of

agents can not benefit one of them without hurting another. A minimally fair game is a game in which all actions are available to all agents,

and all agents who select identical actions get identical payoffs.

Minimally fair games are extremely rich and include all symmetric games, job scheduling games, etc.

Fair efficient outcome: an outcome in which the payoff of the player who gets

the minimal payoff is maximized (min-max fairness).

Routing Mediators: Main Result

Main Theorem: Let be a minimally fair game. Then, a fairefficient outcome of can be implemented as asuper-strong mediated equilibrium using routing mediators.


When there is a mediator that attempts to lead rational agents to desired behavior the Nash equilibrium is a sound concept. It provides a solution that any single agent will not want to deviate from it, assuming all other agents stick to it.

• However, it does not provide an answer to two major challenges:

a. What about stability against deviations by coalitions? b. What happens if agents have “minimal rationality”, and

all that we can assume is that an agent will use a strategy if it a dominant strategy maximizing its payoff regardless of other agents’ actions?

Addressing challenge b: Mediators who can pay(aka k-implementation)

• The important aspect of mediators is their reliability. • While action/routing mediators can act of behalf of agents

who give them the right of play, another type of reliable mediators can instead promise positive payments as a function of the actions selected by the players.

• The reliable party can not punish agents, expose private secrets, etc.

• The reliable party may wish to minimize his payments. • The reliable party wishes to have the desired behavior

implemented as a dominant strategy.

An example

• A socially optimal action can not be guaranteed in the famous battle of the sexes. Why should agents follow a prescribed equilibrium behavior?

Boxing Concert

Boxing 2,1 0,0

Concert 0,0 1,2

An example

• A reliable mediator can promise to pay 10 to the row agent if (Boxing,Concert) is performed and to pay 10 to the column agent if (concert, boxing) is performed.

Boxing Concert

Boxing 2,1 0,0

Concert 0,0 1,2

An example

• Boxing becomes a dominant strategy for each of the players!

• The mediator needs to pay nothing!

Boxing Concert

Boxing 2,1 10,0

Concert 0,10 1,2

A general result

• Any Nash (and even Correlated) equilibrium can be implemented as a dominant strategies equilibrium using 0 costs!

• This type of mediators allows for the implementation of desired behaviors as dominant strategies!

Group Dominant Strategies

• The most desired solution concept is one that guarantees that no subset of the agents can benefit by deviating against any behavior of the rest of agents.

• If we allow routing mediators who can also promise positive monetary payments based on the actions actually selected in the game we have:

Let be a minimally fair game. Then, a fair efficient outcome of can be implemented in group dominant strategies using “combined mediators”, with 0 cost!

Conclusions: The Power of Mediators

• Showed the power of action/routing mediators in obtaining multi-agent behavior which is stable against deviations by coalitions.

• Showed that mediators who can promise positive payments, can implement desired behaviors as dominant strategies, while actually spending minimal (and even zero) costs,.

• By combining both types of capabilities desired behaviors can be implemented as group dominant strategies with 0 cost, for a rich class of settings.

Recommendations in a social choice setting

• So far we discussed the challenge of recommendations in the context of non-cooperative games.

• We also address this challenge in the cooperative setting of social choice.

• In this talk we will mention only one of the problems faced in this research agenda:

How should I vote as a function of other agents’ votes and the trust-relationships among agents?

A Model

• “Trust graph”– Node set N, one node per agent– Edge multiset E N2

• Edge from u to v means “u trusts v”• Multiple parallel edges indicate more trust

• Votes: disjoint V+, V– N

– V+ is set of agents that like the item

– V– is set of agents that dislike the item

• Rec. system (software) assigns {–,0,+} rec. Rs(N,E,V+,V–) to each nonvoter

+

– ++0 +

The axiomatic approachWhich system should we use? The axiomatic approach is: a. Devise a set of natural properties, called axioms, that

uniquely characterize a system (a descriptive approach) b. Devise a set of natural axioms and show whether there

are systems that can satisfy them, or any subset of them (the normative approach)

This is the basis of the classical theory of social choice. Arrow’s impossibility theorem is a particular example.

The approach has been successfully applied to ranking systems and trust systems (Altman and T)

An AxiomatizationNo Groupthink Axiom• If a set S of nonvoters are all + rec’s, then a

majority of the edges from S to N \ S are to + voters or + rec’s

• If a set S of nonvoters are all – or 0 rec’s, then it cannot be that a majority of the edges from S to N \ S are to + voters or + rec’s

+++

(and symmetric – conditions)

––

–

THM : The “No groupthink” axiom uniquely impliesthe min-cut system on

undirected graphs

The Min-Cut System

(Undirected graphs only)

Def: A +cut is a subset of edges that, when removed, leaves no path between –/+ voters

Def: A min+cut is a cut of minimal size

• The rec for node s is: + if in every min+cut s is connected to a + voter, – if in every min+cut s is connected to a – voter,0 otherwise

+

– ++0 +

Conclusion :Trust-Based Recommendation Systems

• Extending upon the axiomatic approach to social choice.

• The axiomatic approach provides concrete suggestions based on rigorous foundations.

Min-cut system for undirected graphs ……. ……. Random walk system for directed graphs One impossibility theorem ……… ………

The General Perspective

• Game theory provides a great setting for the analysis of multi-agent systems.

• Two fundamental challenges:

1. How should an agent choose his action in a

given game?

2. How can a mediator lead rational agents to

desired behavior in a given game?

Addressing the challenges: is it a hopeless battle?

1. How should an agent choose his action in a

given game?

No general answers.

2. How can a mediator lead rational agents to

desired behavior in a given game?

(correlated) Equilibrium provides an

extremely weak answer.

Some small steps in an uphill battle

• C-competitive strategies and their applications (congestion, ad auctions).

• Action Prediction using learning in ensemble of games.

• Action/routing mediators. • K-implementation. • The axiomatic approach to

ranking/trust/recommendation systems• Applications to ad auctions and social networks.

Acknowledgements• Work on the agent perspective is joint work with Alon

Altman, Avivit Boden-Bercovici, and Danny Kuminov.• Work on the mediator perspective is joint work with

Dov Monderer, Ola Rozenfeld, and Itai Ashlagi, and expands on ideas suggested in joint work with Yoav Shoham and Yoram Moses on Artificial Social Systems.

• Work on the axiomatic approach to recommendation systems is joint work with R. Anderson, C. Borgs, J, Chayes, U. Feige, A. Flaxman, A. Kalai, and V.Mirrokni, expanding on ideas developed with Alon Altman.

• Many thanks also to Ehud Kalai and Ido Erev.

Acknowledgements• Based on: Tennenholtz, M., Competitive Safety Analysis: Robust Decision Making in

Multi-Agent Systems, Journal of Artificial Intelligence Research, 2002. Kuminov, D., and Tennenholtz, M., Competitive Safety Analysis in Position

Auctions, WINE-07. Altman, A., Boden-Bercovici, A., Tennenholtz, M., Learning in one-shot

strategic form games, ECML-06. Monderer, D., and Tennenholtz, M., Strong Mediated Equilibrium, AAAI-06. Monderer, D., and Tennenholtz, M., K-Implementation, Journal of Artificial

Intelligence Research, 2004. Rozenfeld, O., and Tennenholtz, M., Routing Mediators, IJCAI-07 Rozenfeld, O., and Tennneholtz, M., Group Dominant Strategies, WINE-07 Anderson, R, Borgs, C., Chayes, J , Feige, U., Flaxman, A., Kalai, A.,

Mirrokni, V., and Tennenholtz, M. An Axiomatic Approach to Trust-Based Recommendation Systems, WWW-08.

Thanks!!!

game-theoretic recommendations: some progress in an uphill battle moshe tennenholtz...

Documents

agent design

agent perspectiveequilibrium

multiagent environment

agent iagent jtargetfs

best nash equilibrium

nash equilibria

competitive safety strategy

game theory literature