study of the behavior of different algorithms in 2*2 matrix games through round robin and...
DESCRIPTION
Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournamentsTRANSCRIPT
Assignment 2 report: Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary
tournaments
Submitted By: Yomna Mahmoud Ibrahim Hassan
Introduction
Through this report, I go through the analysis done to design an algorithm for playing different 2*2 matrix games. Also I analyze the behavior of the algorithm designed against other 7 algorithms. This analysis is done through two types of tournaments: round-robin and evolutionary tournaments.
The main objective while designing the algorithm was to reach an algorithm capable of “beating” other algorithm in different games within different tournaments. The concept of “beating” here is that on average it gets higher payoff than others. Also the algorithm needs to be robust to different changes, in this report for instance we discuss the effect of “prior” on the performance of the algorithm.
In addition, I discuss the basic requirements of a successful algorithm, depending on the results. Also I discuss which algorithms affected the results only by its existence. For example which algorithms acted as “king makers”, giving very high payoffs to some algorithms, while decreasing the payoffs of others.
Design
Before designing the algorithm, I ran the round robin tournament, with the acclaimed best algorithm in prisoner’s dilemma “tit for tat” (TFT) running twice (two out of 8 algorithms are TFT). This was in an effort to see if there is some sort of a pattern that exists within the player’s payoffs within different games. Although these results do not confirm that it will perform well in evolutionary tournament as well, but it gives us a vague idea on what we should consider while designing the algorithm.
The following tables show the results of running a round- robin tournament (Payoffs shown are average taken over 1000 rounds) on three different games: Prisoner’s dilemma, a modified version of chicken and the stag-hunt.
Prisoner’s dilemma
TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 3 3 2.078 0.999 3 0.999 1.998 3 2.25925
TF2T 3 3 1.503 0 3 0 0 3 1.687875
Random 1.922 4.076 1.804 0.528 4.012 0.464 2.266 2.057 2.141125 Al. D 1.004 5 2.932 1 5 1 3 1.004 2.4925 Al. C 3 3 1.68 0 3 0 0 3 1.71
Maximin 1.004 5 2.956 1 5 1 3 1.004 2.4955 Winstay 2.003 5 2.122 0.5 5 0.5 2.998 2.003 2.51575
TFT 3 3 1.988 0.999 3 0.999 1.998 3 2.248
Modified Chicken
TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 3 3 2.09 1.003 3 3 3.5 3 2.699125
TF2T 3 3 3.545 4 3 3 4 3 3.318125 Random 2.104 4.593 1.972 2.503 4.524 4.476 4.894 1.943 3.376125
Al. D 1.005 6 3.42 1 6 6 5.995 1.005 3.803125 Al. C 3 3 3.568 4 3 3 4 3 3.321
Maximin 3 3 3.454 4 3 3 4 3 3.30675 Winstay 3.5 6 4.922 3.997 6 6 2 3.5 4.489875
TFT 3 3 2.054 1.003 3 3 3.5 3 2.694625
Stag hunt
TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 4 4 2.757 1.993 4 1.993 1.993 4 3.092
TF2T 4 4 -0.194 -5 4 -5 -5 4 0.10075 Random 3.071 3.456 2.896 -1.927 3.471 -1.479 -1.01 2.998 1.4345
Al. D 2.001 3 2.487 2 3 2 2 2.001 2.311125 Al. C 4 4 -0.95 -5 4 -5 -5 4 0.00625
Maximin 2.001 3 2.554 2 3 2 2 2.001 2.3195 Winstay 2.001 3 2.502 2 3 2 2 2.001 2.313
TFT 4 4 3.032 1.993 4 1.993 1.993 4 3.126375
From the results we deduce the following:
1- “Random” algorithms makes all algorithms confused, it plays a huge role in identifying who will win and who won’t. And although it’s not stable, its payoff is one of the highest. (Notice that randomization here is done depending on a probability distribution, as it is based on the Random function implemented in the .Net framework).
2- “Win stay” plays really good in the first 2 games (highest), and its performance is really well in stag hunt as well.
3- Algorithms that played worse were the one that were too “nice” (tit for 2 tat and always cooperate). They got exploited easily; especially by really “mean” algorithms such as always defect.
4- Algorithms that are game dependant performed really well. For example: Maximin, Win stay.
Another point that I wanted to take into consideration is the prior. That is why I ran another tournament, where the prior of all algorithms is that the opponent defects. The following tables show the results in different games.
Prisoner’s dilemma
TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 1 3.002 2.072 1 3.002 1 2.003 1 1.759875
TF2T 2.997 3 1.479 0 3 0 3 2.997 2.059125 Random 2.023 4.064 1.984 0.426 4.03 0.512 2.895 1.854 2.2235
Al. D 1 5 2.916 1 5 1 3 1 2.4895 Al. C 2.997 3 1.482 0 3 0 3 2.997 2.0595
Maximin 1 5 2.904 1 5 1 3 1 2.488 Winstay 1.998 3 2.736 0.5 3 0.5 3 1.998 2.0915
TFT 1 3.003 2.012 1 2.002 1 2.003 1 1.6275
Chicken modified
TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 1 3.003 2.052 1 3.003 3.003 3.5 1 2.195125
TF2T 3.001 3 3.461 4 3 3 3.999 3.001 3.30775 Random 1.862 4.62 1.998 2.356 4.329 2.425 4.815 1.937 3.04275
Al. D 1 6 3.545 1 6 6 6 1 3.818125 Al. C 3.001 3 3.569 4 3 3 3.999 3.001 3.32125
Maximin 3.001 3 3.514 4 3 3 3.999 3.001 3.314375 Winstay 3.5 5.997 4.811 4 5.997 5.997 2 3.5 4.47525
TFT 1 3.003 2.02 1 3.003 3.003 3.5 1 2.191125
Stag hunt
TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 2 3.999 2.782 2 3.999 2 1.994 2 2.59675
TF2T 3.991 4 -0.293 -5 4 -5 4 3.991 1.211125 Random 2.809 33.48 2.986 -1.633 3.515 -1.178 -1.303 2.957 5.204125
Al. D 2 3 2.514 2 3 2 2.001 2 2.314375 Al. C 3.991 4 -0.698 -5 4 -5 4 3.991 1.1605
Maximin 2 3 2.504 2 3 2 2.001 2 2.313125 Winstay 1.994 4 2.519 1.993 4 1.993 4 1.994 2.811625
TFT 2 3.999 3.015 2 3.999 2 1.994 2 2.625875
We can see that the algorithms most affected by this change were immediate retaliators. On the other hand, game dependant algorithms still performed really well in comparison.
From this I reached the main idea of the algorithm, which will evolve over time as I run other tournaments.
Algorithm
Win Stay Modified
The algorithm is a modified version of the “WinStay”. “WinStay” only take into account its own previous step as a judgment. In this algorithm, I took a larger history (5 steps) of my steps into account. The following is a simple pseudo code of the algorithm:
1. for each of my previous 5 steps
2. Check the payoff, if it was higher than average
3. Increase the vote for this action
4. End for loop
5. Take the action with highest number of votes.
The motivation behind this is that maybe by taking a larger history, I can avoid quick retaliation. Also I take into account the game design while playing, which is important as we mentioned before.
I ran both round robin and evolutionary tournaments on the different algorithms. In these tournaments, I gave all the algorithms misleading prior. The following tables represent the results in the 3 different games:
Prisoner’s dilemma
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Average
Mod.
TFT 3 3 2.038 0.999 3 0.999 1.998 0.999 2.004125 TF2T 3 3 1.581 0 3 0 0 0 1.322625
Random 2.038 3.83 1.786 0.456 4.102 0.5 1.758 0.507 1.872125 Al. D 1.004 5 2.708 1 5 1 3 1 2.464 Al. C 3 3 1.62 0 3 0 0 0 1.3275
Maximin 1.004 5 3.036 1 5 1 3 1 2.505 Winstay 2.003 5 2.84 0.5 5 0.5 2.998 0.5 2.417625 WinStay
Mod. 1.004 4 3.136 1 5 1 3 1 2.3925
Evolutionary
Chicken modified
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay
Mod. Average TFT 3 3 2.164 1.003 3 3 3.5 1.003 2.45875
TF2T 3 3 3.543 4 3 3 4 4 3.442875 Random 2.076 4.452 2.126 2.371 4.674 4.653 4.932 2.548 3.479
Al. D 1.005 6 3.565 1 6 6 5.995 1 3.820625 Al. C 3 3 3.496 4 3 3 4 4 3.437
Maximin 3 3 3.483 4 3 3 4 4 3.435375 Winstay 3.5 6 4.913 3.997 6 6 2 3.997 4.550875 WinStay 1.005 6 3.63 1 6 6 5.995 1 3.82875
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
WinStay Modified 1
Mod.
Evolutionary
Stag hunt
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay
Mod. Average TFT 4 4 2.786 1.993 4 1.993 1.993 1.993 2.84475
TF2T 4 4 -0.797 -5 4 -5 -5 -5 -1.09963 Random 2.977 3.436 3.072 -1.227 3.506 -1.234 -1.262 -1.794 0.93425
Al. D 2.001 3 2.443 2 3 2 2 2 2.3055 Al. C 4 4 -0.491 -5 4 -5 -5 -5 -1.06138
Maximin 2.001 3 2.51 2 3 2 2 2 2.313875 Winstay 2.001 3 2.465 2 3 2 2 2 2.30825 WinStay
Mod. 2.001 3 2.426 2 3 2 2 2 2.303375
Evolutionary
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
WinStay Modified
From the results we can see that our algorithm didn’t perform well, even in self play, in both prisoner’s dilemma and chicken. On the other hand it performed well in stag hunt.
Win Stay Modified 2
As we can notice in the previous simulation, the algorithms that performed well were the “nice” algorithms (the one that never start with a defection). The following tables show the result with a new modification of the algorithm. I added to it the condition of never being the one to defect.
Prisoner’s dilemma
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 2 Average
TFT 3 3 2.091 0.999 3 0.999 3 3 2.386125 TF2T 3 3 2.096 0.998 3 0.998 3 3 2.3865
Random 1.965 2.06 1.924 0.502 4.132 0.526 1.95 0.455 1.68925 Al. D 1.004 1.008 2.968 1 5 1 3 1.012 1.999 Al. C 3 3 1.332 0 3 0 3 3 2.0415
Maximin 1.004 1.008 3.028 1 5 1 3 1.012 2.0065 Winstay 3 3 2.382 0.5 3 0.5 3 3 2.29775 WinStay
Mod2. 3 3 3.073 0.997 3 0.997 3 3 2.508375
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
WinStay Modified
Evolutionary
Chicken modified
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 2 Average
TFT 3 3 1.996 1.003 3 3 3.5 1.003 2.43775 TF2T 3 3 2.252 1.006 3 3 3.999 1.006 2.532875
Random 2.016 1.846 1.983 2.386 4.569 4.71 4.844 2.599 3.119125 Al. D 1.005 1.01 3.775 1 6 6 5.995 1 3.223125 Al. C 3 3 3.474 4 3 3 4 4 3.43425
Maximin 3 3 3.525 4 3 3 4 4 3.440625 Winstay 3.5 4.001 4.921 3.997 6 6 2 3.997 4.302 WinStay
Mod2. 1.005 1.01 2.715 1 6 6 5.995 1 3.090625
Evolutionary
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
WinStay Modified 2
Stag hunt
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 2 Average
TFT 4 4 2.943 1.993 4 1.993 4 4 3.366125 TF2T 4 4 2.708 1.986 4 1.986 4 4 3.335
Random 2.769 2.842 3.042 -1.332 3.482 -1.276 -1.377 -0.829 0.915125 Al. D 2.001 2.002 2.544 2 3 2 2.001 2.003 2.193875 Al. C 4 4 0.184 -5 4 -5 4 4 1.273
Maximin 2.001 2.002 2.537 2 3 2 2.001 2.003 2.193 Winstay 4 4 2.477 1.993 4 1.993 4 4 3.307875 WinStay
Mod2. 4 4 2.526 1.979 4 1.979 4 4 3.3105
Evolutionary
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
WinStay Modified 2
From the results, we can notice the following: On average, the algorithm performed well. The only exception was in the case of the game “chicken”, especially in self play, although its average with other players is not as bad as in self play. After running the simulation several times, and following up on the actions taken, I deduced that one of the reasons that it doesn’t perform well in self play, is that both agents try to play the “cooperate”, which in the case of our game “modified chicken”, doesn’t give the best possible payoff for either players.
Win Stay Modified 3
As a result of these simulations, I came to the idea of enhancing the algorithm with a simple version of “fictitious play”, where the algorithm tries to model other players based on the history. This is a pseudo code representation of the algorithm:
1. for each previous step in history (5 previous steps)
2. if the step payoff is higher than average
3. Add a vote to it, then add a vote to the action the other player took at this time
4. end of for loop
5. Take the action with the highest votes based on my side and other player side
The results of the simulations that included the latest modified version of Win Stay are shown in the following table and figure:
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
Winstay Modified 2
Prisoner’s dilemma
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 3 Average
TFT 3 3 1.843 0.999 3 0.999 3 3 2.355125 TF2T 3 3 1.864 0.998 3 0.998 3 3 2.3575
Random 1.799 2.024 1.98 0.466 4.046 0.507 2.523 0.547 1.7365 Al. D 1.004 1.008 2.776 1 5 1 3 1.012 1.975 Al. C 3 3 1.671 0 3 0 3 3 2.083875
Maximin 1.004 1.008 3.076 1 5 1 3 1.012 2.0125 Winstay 3 3 2.167 0.5 3 0.5 3 3 2.270875 WinStay
Mod 3. 3 3 2.493 0.997 3 0.997 3 3 2.435875
Evolutionary
Chicken modified
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 3 Average
TFT 3 3 2.086 1.003 3 3 3.5 1.003 2.449
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
Winstay Modified 3
TF2T 3 3 2.292 1.006 3 3 3.999 1.006 2.537875 Random 1.98 1.948 1.829 2.746 4.368 4.557 4.941 2.581 2.796
Al. D 1.005 1.01 3.465 1 6 6 5.995 1 3.184375 Al. C 3 3 3.525 4 3 3 4 4 3.440625
Maximin 3 3 3.55 4 3 3 4 4 3.44375 Winstay 3.5 4.001 4.878 3.997 6 6 2 3.997 4.296625 WinStay
Mod 3. 1.005 1.01 3.525 1 6 6 5.995 1 3.191875
Evolutionary
Stag hunt
Round robin
TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 3 Average
TFT 4 4 2.914 1.993 4 1.993 4 4 3.3625 TF2T 4 4 2.874 1.986 4 1.986 4 4 3.35575
Random 2.865 2.94 3.03 -1.073 3.552 -1.64 -1.016 -0.584 1.00925 Al. D 2.001 2.002 2.451 2 3 2 2.001 2.003 2.18225 Al. C 4 4 -0.644 -5 4 -5 4 4 1.1695
Maximin 2.001 2.002 2.482 2 3 2 2.001 2.003 2.186125 Winstay 4 4 2.406 1.993 4 1.993 4 4 3.299
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
Winstay Modified 3
WinStay Mod 3. 4 4 2.531 1.979 4 1.979 4 4 3.311125
Evolutionary
We can see from the results that the algorithm out-performs the other algorithms in all games in case of evolutionary tournaments, as well as being one of the top algorithms ( first place in prisoner’s dilemma) in case of running a round robin tournament.
Note that if we compare the average of our algorithm in the latest round robin tournament to the performance of the previous version of algorithm. Although it is one of the best in all games, its average is less than that of the previous version. From this we can conclude that being the “best” and “beating” other algorithms, doesn’t mean having the best possible performance with respect to other players on average.
Conclusions
From the results in different situations, we can see that there are different factors affecting the how a certain algorithm may perform, this includes:
1- History given to different algorithms
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
TFT
TF2T
Random
Always Defect
Always Cooperate
Maximin
WinStay
Winstay Modified 3
2- Other algorithms existing in the competition pool (are they retaliatory or not?)
3- Type of tournament (we can see that performing well in one type of tournament, doesn’t mean excelling in the other)
4- Goal of the algorithm itself (maximizing payoff or destroying others’ payoffs)
Also we can conclude that there are certain properties that if exist in an algorithm, can make it out perform other algorithms. These properties are:
1- Having an optimistic prior: always try to not be the one who defect first, be optimistic that others will cooperate as well.
2- Estimating my own payoff
3- Modeling of other player: if we have some information about other players (even by modeling other algorithms through their actions), it gives the algorithm (sometimes) better reliability.