satisfaction equilibrium stéphane ross. canadian ai 20062 / 21 problem in real life multiagent...
TRANSCRIPT
Canadian AI 2006 2 / 21
Problem In real life multiagent systems :
Agents generally do not know the preferences (rewards) of their opponents
Agents may not observe the actions of their opponents
In this context, most game theoretic solution concepts are hardly applicable
We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play
Canadian AI 2006 3 / 21
Plan
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Canadian AI 2006 4 / 21
Presentation Plan
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Canadian AI 2006 5 / 21
Game model
: Number of agents : Joint action space : Set of possible outcomes , the outcome function. , agent i’s reward function.
Agent only knows , and . After each turn, every agent observes an
outcome .
Canadian AI 2006 6 / 21
Game model
Observations: The agents do not know the game matrix
They are unable to compute best responses and Nash Equilibrium.
They can only reason on their history of actions and rewards.
A a,? b,?
B c,? d,?a,b,c,d
Canadian AI 2006 7 / 21
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Presentation Plan
Canadian AI 2006 8 / 21
Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should
keep playing the same strategy An unsatisfied agent may decide to change its
strategy according to some exploration function
An equilibrium will arise when all agents are satisfied.
Satisfaction Equilibrium
Canadian AI 2006 9 / 21
Formally : is the satisfaction function of agent :
if (agent i is satisfied) if (agent i is not satisfied)
is the satisfaction threshold of agent
A joint strategy is a satisfaction equilibrium if :
Satisfaction Equilibrium
Canadian AI 2006 10 / 21
Example
Prisoner’s dilemma
Possible satisfaction matrix :
C D
C -1, -1 -10, 0
D 0, -10 -8,-8
Dominant strategy : D
Nash Equilibrium : (D,D)
Pareto-Optimal : (C,C), (D,C), (C,D)
C D
C 1, 1 0, 1
D 1, 0 0, 0
C D
C 1, 1 0, 1
D 1, 0 1, 1
Canadian AI 2006 11 / 21
Satisfaction Equilibrium
However, even if a satisfaction equilibrium exists, it may be unreachable :
A B C
A 1,1 0,1 0,1
B 1,0 1,0 0,1
C 1,0 0,1 1,0
Canadian AI 2006 12 / 21
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Presentation Plan
Canadian AI 2006 13 / 21
Satisfaction Equilibrium Learning
If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly
We can also use other exploration functions which favour actions that have not been explored often Ex:
Canadian AI 2006 14 / 21
Satisfaction Equilibrium Learning
We use a simple update rule: When the agent is satisfied, we increment its
satisfaction threshold by some variable If the agent is unsatisfied, we decrement its
satisfaction threshold of is multiplied by a factor each turn such
that it converges to 0 We also use a limited history of our previous
satisfaction states and thresholds for each action to bound the value of the satisfaction threshold
Canadian AI 2006 15 / 21
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Presentation Plan
Canadian AI 2006 16 / 21
Results Fixed satisfaction thresholds
In simple games, we were always able to reach a satisfaction equilibrium.
Using a biased exploration improves the speed of convergence of the algorithm.
Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium
in simple games. Using a biased exploration improves the convergence
percentage of the algorithm. The factor and history size affects the convergence of the
algorithm and need to be adjusted to get optimal results.
Canadian AI 2006 18 / 21
Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Presentation Plan
Canadian AI 2006 19 / 21
Conclusion
It is possible to learn stable outcomes without observing anything but our own rewards
Satisfaction equilibria can be defined on any Pareto-Optimal solution.However, satisfaction equilibria are not always
reachable The proposed learning algorithms achieves good
performance in simple gamesHowever, they require game-specific
adjustments for optimal performance
Canadian AI 2006 20 / 21
Conclusion
For more information, you can consult my publications at:http://www.damas.ift.ulaval.ca/~ross
Thank You!