study of the behavior of different algorithms in 2*2 matrix games through round robin and...

Assignment 2 report: Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary

tournaments

Submitted By: Yomna Mahmoud Ibrahim Hassan

Introduction

Through this report, I go through the analysis done to design an algorithm for playing different 2*2 matrix games. Also I analyze the behavior of the algorithm designed against other 7 algorithms. This analysis is done through two types of tournaments: round-robin and evolutionary tournaments.

The main objective while designing the algorithm was to reach an algorithm capable of “beating” other algorithm in different games within different tournaments. The concept of “beating” here is that on average it gets higher payoff than others. Also the algorithm needs to be robust to different changes, in this report for instance we discuss the effect of “prior” on the performance of the algorithm.

In addition, I discuss the basic requirements of a successful algorithm, depending on the results. Also I discuss which algorithms affected the results only by its existence. For example which algorithms acted as “king makers”, giving very high payoffs to some algorithms, while decreasing the payoffs of others.

Design

Before designing the algorithm, I ran the round robin tournament, with the acclaimed best algorithm in prisoner’s dilemma “tit for tat” (TFT) running twice (two out of 8 algorithms are TFT). This was in an effort to see if there is some sort of a pattern that exists within the player’s payoffs within different games. Although these results do not confirm that it will perform well in evolutionary tournament as well, but it gives us a vague idea on what we should consider while designing the algorithm.

The following tables show the results of running a round- robin tournament (Payoffs shown are average taken over 1000 rounds) on three different games: Prisoner’s dilemma, a modified version of chicken and the stag-hunt.

Prisoner’s dilemma

TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 3 3 2.078 0.999 3 0.999 1.998 3 2.25925

TF2T 3 3 1.503 0 3 0 0 3 1.687875

Random 1.922 4.076 1.804 0.528 4.012 0.464 2.266 2.057 2.141125 Al. D 1.004 5 2.932 1 5 1 3 1.004 2.4925 Al. C 3 3 1.68 0 3 0 0 3 1.71

Maximin 1.004 5 2.956 1 5 1 3 1.004 2.4955 Winstay 2.003 5 2.122 0.5 5 0.5 2.998 2.003 2.51575

TFT 3 3 1.988 0.999 3 0.999 1.998 3 2.248

Modified Chicken

TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 3 3 2.09 1.003 3 3 3.5 3 2.699125

TF2T 3 3 3.545 4 3 3 4 3 3.318125 Random 2.104 4.593 1.972 2.503 4.524 4.476 4.894 1.943 3.376125

Al. D 1.005 6 3.42 1 6 6 5.995 1.005 3.803125 Al. C 3 3 3.568 4 3 3 4 3 3.321

Maximin 3 3 3.454 4 3 3 4 3 3.30675 Winstay 3.5 6 4.922 3.997 6 6 2 3.5 4.489875

TFT 3 3 2.054 1.003 3 3 3.5 3 2.694625

Stag hunt

TFT TF2T Random Alwa. D Alwa. C Maximin Winstay TFT Average TFT 4 4 2.757 1.993 4 1.993 1.993 4 3.092

TF2T 4 4 -0.194 -5 4 -5 -5 4 0.10075 Random 3.071 3.456 2.896 -1.927 3.471 -1.479 -1.01 2.998 1.4345

Al. D 2.001 3 2.487 2 3 2 2 2.001 2.311125 Al. C 4 4 -0.95 -5 4 -5 -5 4 0.00625

Maximin 2.001 3 2.554 2 3 2 2 2.001 2.3195 Winstay 2.001 3 2.502 2 3 2 2 2.001 2.313

TFT 4 4 3.032 1.993 4 1.993 1.993 4 3.126375

From the results we deduce the following:

1- “Random” algorithms makes all algorithms confused, it plays a huge role in identifying who will win and who won’t. And although it’s not stable, its payoff is one of the highest. (Notice that randomization here is done depending on a probability distribution, as it is based on the Random function implemented in the .Net framework).

2- “Win stay” plays really good in the first 2 games (highest), and its performance is really well in stag hunt as well.

3- Algorithms that played worse were the one that were too “nice” (tit for 2 tat and always cooperate). They got exploited easily; especially by really “mean” algorithms such as always defect.

4- Algorithms that are game dependant performed really well. For example: Maximin, Win stay.

Another point that I wanted to take into consideration is the prior. That is why I ran another tournament, where the prior of all algorithms is that the opponent defects. The following tables show the results in different games.


TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 1 3.002 2.072 1 3.002 1 2.003 1 1.759875

TF2T 2.997 3 1.479 0 3 0 3 2.997 2.059125 Random 2.023 4.064 1.984 0.426 4.03 0.512 2.895 1.854 2.2235

Al. D 1 5 2.916 1 5 1 3 1 2.4895 Al. C 2.997 3 1.482 0 3 0 3 2.997 2.0595

Maximin 1 5 2.904 1 5 1 3 1 2.488 Winstay 1.998 3 2.736 0.5 3 0.5 3 1.998 2.0915

TFT 1 3.003 2.012 1 2.002 1 2.003 1 1.6275

Chicken modified

TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 1 3.003 2.052 1 3.003 3.003 3.5 1 2.195125

TF2T 3.001 3 3.461 4 3 3 3.999 3.001 3.30775 Random 1.862 4.62 1.998 2.356 4.329 2.425 4.815 1.937 3.04275

Al. D 1 6 3.545 1 6 6 6 1 3.818125 Al. C 3.001 3 3.569 4 3 3 3.999 3.001 3.32125

Maximin 3.001 3 3.514 4 3 3 3.999 3.001 3.314375 Winstay 3.5 5.997 4.811 4 5.997 5.997 2 3.5 4.47525

TFT 1 3.003 2.02 1 3.003 3.003 3.5 1 2.191125

Stag hunt

TFT TF2T Random Alw. D Alwa. C Maximin Winstay TFT Average TFT 2 3.999 2.782 2 3.999 2 1.994 2 2.59675

TF2T 3.991 4 -0.293 -5 4 -5 4 3.991 1.211125 Random 2.809 33.48 2.986 -1.633 3.515 -1.178 -1.303 2.957 5.204125

Al. D 2 3 2.514 2 3 2 2.001 2 2.314375 Al. C 3.991 4 -0.698 -5 4 -5 4 3.991 1.1605

Maximin 2 3 2.504 2 3 2 2.001 2 2.313125 Winstay 1.994 4 2.519 1.993 4 1.993 4 1.994 2.811625

TFT 2 3.999 3.015 2 3.999 2 1.994 2 2.625875

We can see that the algorithms most affected by this change were immediate retaliators. On the other hand, game dependant algorithms still performed really well in comparison.

From this I reached the main idea of the algorithm, which will evolve over time as I run other tournaments.

Algorithm

Win Stay Modified

The algorithm is a modified version of the “WinStay”. “WinStay” only take into account its own previous step as a judgment. In this algorithm, I took a larger history (5 steps) of my steps into account. The following is a simple pseudo code of the algorithm:

1. for each of my previous 5 steps

2. Check the payoff, if it was higher than average

3. Increase the vote for this action

4. End for loop

5. Take the action with highest number of votes.

The motivation behind this is that maybe by taking a larger history, I can avoid quick retaliation. Also I take into account the game design while playing, which is important as we mentioned before.

I ran both round robin and evolutionary tournaments on the different algorithms. In these tournaments, I gave all the algorithms misleading prior. The following tables represent the results in the 3 different games:


Round robin

TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Average

Mod.

TFT 3 3 2.038 0.999 3 0.999 1.998 0.999 2.004125 TF2T 3 3 1.581 0 3 0 0 0 1.322625

Random 2.038 3.83 1.786 0.456 4.102 0.5 1.758 0.507 1.872125 Al. D 1.004 5 2.708 1 5 1 3 1 2.464 Al. C 3 3 1.62 0 3 0 0 0 1.3275

Maximin 1.004 5 3.036 1 5 1 3 1 2.505 Winstay 2.003 5 2.84 0.5 5 0.5 2.998 0.5 2.417625 WinStay

Mod. 1.004 4 3.136 1 5 1 3 1 2.3925

Evolutionary

Chicken modified

Round robin

TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay

Mod. Average TFT 3 3 2.164 1.003 3 3 3.5 1.003 2.45875

TF2T 3 3 3.543 4 3 3 4 4 3.442875 Random 2.076 4.452 2.126 2.371 4.674 4.653 4.932 2.548 3.479

Al. D 1.005 6 3.565 1 6 6 5.995 1 3.820625 Al. C 3 3 3.496 4 3 3 4 4 3.437

Maximin 3 3 3.483 4 3 3 4 4 3.435375 Winstay 3.5 6 4.913 3.997 6 6 2 3.997 4.550875 WinStay 1.005 6 3.63 1 6 6 5.995 1 3.82875

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

WinStay Modified 1

Mod.

Evolutionary

Stag hunt

Round robin

TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay

Mod. Average TFT 4 4 2.786 1.993 4 1.993 1.993 1.993 2.84475

TF2T 4 4 -0.797 -5 4 -5 -5 -5 -1.09963 Random 2.977 3.436 3.072 -1.227 3.506 -1.234 -1.262 -1.794 0.93425

Al. D 2.001 3 2.443 2 3 2 2 2 2.3055 Al. C 4 4 -0.491 -5 4 -5 -5 -5 -1.06138

Maximin 2.001 3 2.51 2 3 2 2 2 2.313875 Winstay 2.001 3 2.465 2 3 2 2 2 2.30825 WinStay

Mod. 2.001 3 2.426 2 3 2 2 2 2.303375

Evolutionary

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

WinStay Modified

From the results we can see that our algorithm didn’t perform well, even in self play, in both prisoner’s dilemma and chicken. On the other hand it performed well in stag hunt.

Win Stay Modified 2

As we can notice in the previous simulation, the algorithms that performed well were the “nice” algorithms (the one that never start with a defection). The following tables show the result with a new modification of the algorithm. I added to it the condition of never being the one to defect.


Round robin

TFT TF2T Random Alw. D Alwa. C Maximin Winstay Winstay Mod. 2 Average

TFT 3 3 2.091 0.999 3 0.999 3 3 2.386125 TF2T 3 3 2.096 0.998 3 0.998 3 3 2.3865

Random 1.965 2.06 1.924 0.502 4.132 0.526 1.95 0.455 1.68925 Al. D 1.004 1.008 2.968 1 5 1 3 1.012 1.999 Al. C 3 3 1.332 0 3 0 3 3 2.0415

Maximin 1.004 1.008 3.028 1 5 1 3 1.012 2.0065 Winstay 3 3 2.382 0.5 3 0.5 3 3 2.29775 WinStay

Mod2. 3 3 3.073 0.997 3 0.997 3 3 2.508375

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

WinStay Modified

Evolutionary

Chicken modified

Round robin


TFT 3 3 1.996 1.003 3 3 3.5 1.003 2.43775 TF2T 3 3 2.252 1.006 3 3 3.999 1.006 2.532875

Random 2.016 1.846 1.983 2.386 4.569 4.71 4.844 2.599 3.119125 Al. D 1.005 1.01 3.775 1 6 6 5.995 1 3.223125 Al. C 3 3 3.474 4 3 3 4 4 3.43425

Maximin 3 3 3.525 4 3 3 4 4 3.440625 Winstay 3.5 4.001 4.921 3.997 6 6 2 3.997 4.302 WinStay

Mod2. 1.005 1.01 2.715 1 6 6 5.995 1 3.090625

Evolutionary

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

WinStay Modified 2

Stag hunt

Round robin


TFT 4 4 2.943 1.993 4 1.993 4 4 3.366125 TF2T 4 4 2.708 1.986 4 1.986 4 4 3.335

Random 2.769 2.842 3.042 -1.332 3.482 -1.276 -1.377 -0.829 0.915125 Al. D 2.001 2.002 2.544 2 3 2 2.001 2.003 2.193875 Al. C 4 4 0.184 -5 4 -5 4 4 1.273

Maximin 2.001 2.002 2.537 2 3 2 2.001 2.003 2.193 Winstay 4 4 2.477 1.993 4 1.993 4 4 3.307875 WinStay

Mod2. 4 4 2.526 1.979 4 1.979 4 4 3.3105

Evolutionary

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

WinStay Modified 2

From the results, we can notice the following: On average, the algorithm performed well. The only exception was in the case of the game “chicken”, especially in self play, although its average with other players is not as bad as in self play. After running the simulation several times, and following up on the actions taken, I deduced that one of the reasons that it doesn’t perform well in self play, is that both agents try to play the “cooperate”, which in the case of our game “modified chicken”, doesn’t give the best possible payoff for either players.

Win Stay Modified 3

As a result of these simulations, I came to the idea of enhancing the algorithm with a simple version of “fictitious play”, where the algorithm tries to model other players based on the history. This is a pseudo code representation of the algorithm:

1. for each previous step in history (5 previous steps)

2. if the step payoff is higher than average

3. Add a vote to it, then add a vote to the action the other player took at this time

4. end of for loop

5. Take the action with the highest votes based on my side and other player side

The results of the simulations that included the latest modified version of Win Stay are shown in the following table and figure:

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

Winstay Modified 2


Round robin


TFT 3 3 1.843 0.999 3 0.999 3 3 2.355125 TF2T 3 3 1.864 0.998 3 0.998 3 3 2.3575

Random 1.799 2.024 1.98 0.466 4.046 0.507 2.523 0.547 1.7365 Al. D 1.004 1.008 2.776 1 5 1 3 1.012 1.975 Al. C 3 3 1.671 0 3 0 3 3 2.083875

Maximin 1.004 1.008 3.076 1 5 1 3 1.012 2.0125 Winstay 3 3 2.167 0.5 3 0.5 3 3 2.270875 WinStay

Mod 3. 3 3 2.493 0.997 3 0.997 3 3 2.435875

Evolutionary

Chicken modified

Round robin


TFT 3 3 2.086 1.003 3 3 3.5 1.003 2.449

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

Winstay Modified 3

TF2T 3 3 2.292 1.006 3 3 3.999 1.006 2.537875 Random 1.98 1.948 1.829 2.746 4.368 4.557 4.941 2.581 2.796

Al. D 1.005 1.01 3.465 1 6 6 5.995 1 3.184375 Al. C 3 3 3.525 4 3 3 4 4 3.440625

Maximin 3 3 3.55 4 3 3 4 4 3.44375 Winstay 3.5 4.001 4.878 3.997 6 6 2 3.997 4.296625 WinStay

Mod 3. 1.005 1.01 3.525 1 6 6 5.995 1 3.191875

Evolutionary

Stag hunt

Round robin


TFT 4 4 2.914 1.993 4 1.993 4 4 3.3625 TF2T 4 4 2.874 1.986 4 1.986 4 4 3.35575

Random 2.865 2.94 3.03 -1.073 3.552 -1.64 -1.016 -0.584 1.00925 Al. D 2.001 2.002 2.451 2 3 2 2.001 2.003 2.18225 Al. C 4 4 -0.644 -5 4 -5 4 4 1.1695

Maximin 2.001 2.002 2.482 2 3 2 2.001 2.003 2.186125 Winstay 4 4 2.406 1.993 4 1.993 4 4 3.299

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

Winstay Modified 3

WinStay Mod 3. 4 4 2.531 1.979 4 1.979 4 4 3.311125

Evolutionary

We can see from the results that the algorithm out-performs the other algorithms in all games in case of evolutionary tournaments, as well as being one of the top algorithms ( first place in prisoner’s dilemma) in case of running a round robin tournament.

Note that if we compare the average of our algorithm in the latest round robin tournament to the performance of the previous version of algorithm. Although it is one of the best in all games, its average is less than that of the previous version. From this we can conclude that being the “best” and “beating” other algorithms, doesn’t mean having the best possible performance with respect to other players on average.

Conclusions

From the results in different situations, we can see that there are different factors affecting the how a certain algorithm may perform, this includes:

1- History given to different algorithms

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

TFT

TF2T

Random

Always Defect

Always Cooperate

Maximin

WinStay

Winstay Modified 3

2- Other algorithms existing in the competition pool (are they retaliatory or not?)

3- Type of tournament (we can see that performing well in one type of tournament, doesn’t mean excelling in the other)

4- Goal of the algorithm itself (maximizing payoff or destroying others’ payoffs)

Also we can conclude that there are certain properties that if exist in an algorithm, can make it out perform other algorithms. These properties are:

1- Having an optimistic prior: always try to not be the one who defect first, be optimistic that others will cooperate as well.

2- Estimating my own payoff

3- Modeling of other player: if we have some information about other players (even by modeling other algorithms through their actions), it gives the algorithm (sometimes) better reliability.

study of the behavior of different algorithms in 2*2 matrix games through round robin and...

Technology