September 21, 2021
Post questions to the Moodle Forum!
This lab will give you practice with functions and conditionals in the context of simulating a classic game theory problem, The Prisoner’s Dilemma.
You will need to download the following files to do this lab: prisoner2.py
, prisoner3.py
, and history.py
. Your solutions to Problems 1–3 (optionally 4) should go in prisoner2.py
and your solutions to Problems 5–6 (optionally 7) should go in prisoner3.py
. You will not need to modify history.py
at all. You will need to download it into the same folder as prisoner2.py
and prisoner3.py
. When you need to get information about a player’s history, refer to the documentation for the History
object. All the operations you will need are described there. Python takes care of providing the self
parameter for you (e.g., get_length
is called with no arguments and get_recent_coop
is called with one argument).
Suggested timeline:
The problems marked (CHALLENGE) are a way to go above and beyond what is expected for a lab. Only attempt them after you are completely finished with all other parts of the lab. Completing all the non-challenge components of a lab will earn 28.5/30 (95%, an A). Earning challenge points can result in a grade higher than 100%.
Problems marked (OPTIONAL) are just for fun! :)
Let’s begin with a story.
Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated the prisoners, visit each of them to offer the same deal. If one testifies for the prosecution against the other (defects) and the other remains silent (cooperates), the defector goes free and the silent accomplice receives the full one-year sentence. If both remain silent, both prisoners are sentenced to only one month in jail for a minor charge. If each betrays the other, each receives a three-month sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?
This story is an example of a classic game theoretic problem called the prisoner’s dilemma. A prisoner’s dilemma always involves two game players, and each has a choice between cooperating and defecting. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. Before going any further, we’ll define some game theory notation.
In game theory, we differentiate between a game, and a play. A game refers to the set of possible choices and outcomes for the entire range of situations. A play refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a two-person binary-choice game is represented by a two-by-two matrix. Here is a hypothetical game matrix (which, by the way, does not represent a prisoner’s dilemma).
B cooperates | B defects | |
---|---|---|
A cooperates | A gets 5 B gets 5 |
A gets 2 B gets 3 |
A defects | A gets 3 B gets 2 |
A gets 1 B gets 1 |
The two players in this case are called A and B, and the choices are called cooperate and defect. Players A and B can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, they announce it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player A chooses to cooperate while Player B defects, then A gets 2 points and B gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player A knows before the game even starts that if they and B both choose to defect, they will each get 1 point.
In an iterated game, the two players play repeatedly; thus after finishing one game, A and B may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a play, which constitutes a single round of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, A and B play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both A and B are able to study the results of the previous nine rounds.
The game depicted by the matrix above is a particularly easy one to analyze. Let’s examine the situation from Player A’s point of view (Player B’s point of view is identical):
“Suppose B cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose B defects. I still do better by cooperating (since I get two points instead of one). So no matter what B does, I am better off cooperating.”
Player B will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both A and B have a dominant choice–i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. As noted, the matrix shown above does not represent a prisoner’s dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We’ll see an example of a prisoner’s dilemma game very shortly.
To re-cap: in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round.
Now consider the following game matrix:
B cooperates | B defects | |
---|---|---|
A cooperates | A gets 2 B gets 2 |
A gets -5 B gets 3 |
A defects | A gets 3 B gets -5 |
A gets -1 B gets -1 |
In this case, Players A and B both have a dominant choice–namely, defection. No matter what Player B does, Player A will generally improve his own score (or at worst lose a little bit) by defecting, and vice versa.
However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of losing one point each, they could win two points each. So the “rational” choice of mutual defection has a puzzling self-destructive flavor.
The second matrix is an example of a prisoner’s dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner’s dilemma situation is characterized by the following conditions: \[DC > CC > DD > CD\] \[CC > \frac{DC + CD}{2}\] Note how for a given opponent’s choice, the player under consideration is always better off defecting. Hence, we see defection is the dominant choice. In the second game matrix we have: \[DC = 5,\,CC = 2,\,DD = -1,\,CD = -3\] so both conditions are met. In the story at the top, you can verify that: \[DC = 0,\,CC = -1,\,DD = -3,\,CD = -12\] Again, these values satisfy the prisoner’s dilemma conditions.
On this homework you will be exploring possible strategies for both two- and three-player prisoner’s dilemma. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner’s dilemma game. Here are some examples:
All of these strategies are somewhat simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. In a computer tournament held in the late 1970’s by the political scientist Robert Axelrod, the first-prize program employed the mimic strategy, and achieved the highest average score in a field of far more complicated programs.
A Python program for an iterated prisoner’s dilemma game is provided in prisoner2.py
as part of the code for this project. The function play_loop
pits two players (or, to be more precise, two strategies) against one another for 100 games, then prints out the average score of each player.
Player strategies are represented as functions. Each strategy takes two arguments–its own history (that is, a representation of all its previous plays) and its opponent’s history. The strategy function returns either the string "c"
for cooperate or the string "d"
for defect.
At the beginning of an iterated game, each history is empty. As the game progresses, the histories grow to represent the sequence of plays ("c"
s and "d"
s) from least recent to most recent. Note each strategy must have its own history as its first argument. So in play_loop
, strat0
has history0
as its first argument, and strat1
has history1
as its first argument. Each history is represented by a History
object. We’ll see how to define our own objects towards the end of the term, but for now we’ll focus on using them. The possible operations on a History
object are described here. These operations are functions associated with a particular object, and are called methods.
As an example, consider the get_most_recent
method. The description says it Returns the most recent action. Remember that actions are either "c"
for cooperate or "d"
for defect. Thus, it enables you to check our opponent’s most recent action, which will be useful for something like the mimic strategy. Note that it has one parameter, self
. In Python, all methods have this self
parameter. All you need to know now is that this parameter is automatically supplied and not something you provide when calling the method. So to get the most recent action of a History
stored in the variable other_history
, you would do something like
= other_history.get_most_recent() last_action
The values from the game matrix are stored in variable named rubric
using something called a dictionary (more about these later in the term). print_results
uses it to calculate the scores at the end of the iterated game. The code starting with the line
if __name__ == "__main__":
takes care of doing the simulation when you run prisoner2.py
as described in the Testing Advice section.
The betrayer
strategy is already implemented in prisoner2.py
. Implement the other two strategies that don’t consider their opponent: friend
and chaos
.
To achieve the random behavior of chaos
, you will need to use Python’s random
module (put import random
at the top of your file). It provides several different ways of generating random outcomes. I recommend using one of these two:
random.random()
will return a random decimal number between 0 and 1.0. Since every value within that range is equally likely, we can combine the returned value with an if
statement to give part of our code a certain percentage chance of happening. Specifically, we compare the random value r
to a decimal number representing the target percentage. For example, this code has a 30% chance of printing "hotdog"
and a 70% chance of printing "salad"
(70% because that’s the chance a 30% event doesn’t happen):= random.random()
r if r < 0.3:
print("hotdog")
else:
print("salad")
random.randint(start, end)
returns a random integer between start
and end
, including start
and end
among the possibilities. You might combine random.randint(0, 1)
and an if
statement to perform a sort of coin flip.Full documentation for the random
module is here.
Implement the mimic
and judge
strategies described above. As these strategies react to their opponents, you will need to make use of the operations on the opponent’s History
object. Refer to the documentation described here.
Write a new strategy careful_mimic
. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. If fewer than two rounds have been played, it should cooperate.
Write a function alternating
which alternates between mimic
and judge
. That is, on the first play, it should act like mimic
. On the second play, it should act like judge
. On the third play, it should act like mimic
again, and so on. You should not copy-paste or re-implement mimic
and judge
within alternating
. Instead, call those functions from within alternating
to get what their action would be.
So far, all of our prisoner’s dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner’s dilemma has focused on two-player games). But it is possible to create a prisoner’s dilemma game that involves three–or even more–players.
Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does mimic
mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as mimic
is in the two-player game?
Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents.
Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary.
Now we are ready to discuss the payoffs for the three-player game. We impose three rules (actually, there is no universal definition for the multi-player prisoner’s dilemma. The constraints used here represent one possible version of the three-player prisoner’s dilemma):
Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints: \[DCC > CCC\] \[DDD > CDD\] \[DCD > CCD\]
A player should always be better off if more of his opponents choose to cooperate. This rule gives: \[DCC > DCD > DDD\] \[CCC > CCD > CDD\]
If one player’s choice is fixed, the other two players should be left in a two-player prisoner’s dilemma. This rule gives the following constraints: \[CCD > DDD\] \[CCC > DCD\] \[CCD > \frac{CDD + DCD}{2}\] \[CCC > \frac{CCD + DCC}{2}\]
We can satisfy all of these constraints with the following payoffs: \[CDD=-5,\,DDD=-2,\,CCD=0,\,DCD=2,\,CCC=3,\,DCC=5\]
We have provided Python code for a three-player iterated game in prisoner3.py
. It has a play_loop
function that takes three strategies as input, keeps track of three histories, and prints out results for three players.
Write strategies betrayer3
, friend3
, and chaos3
that will work in a three-player game. They will need to have three parameters instead of two (their own history and the histories of their two opponents). Try them out to make sure your code is working.
Write two new strategies: tough_mimic
and soft_mimic
. tough_mimic
should defect if either of the opponents defected on the previous round. soft_mimic
should defect only if both opponents defected on the previous round. These will need History
operations you haven’t needed up until now.
Write a function combined
which combines the two-player strategies mimic
and judge
into a three-player strategy. combined
should play mimic
against one of the opponents, and judge
against the other opponent. Playing a strategy against an opponent just means calling the corresponding function with the appropriate History
s. It will return the action that strategy would take against that opponent. This leaves you with two possible actions, returned by either mimic
or judge
. Combine them into a single action using this simple rule: if either defected, combined
defects. Otherwise, combined
cooperates.
You don’t need to re-implement mimic
or judge
in prisoner3.py
, just import the functions you already wrote by including from prisoner2 import mimic, judge
along with the other imports.
As described earlier, Axelrod held a computer tournament to investigate the two-player prisoner’s dilemma. We are going to hold a three-player tournament. You get to design a strategy for the tournament. You might submit one of the strategies developed in the project, or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. You need to: - Include in your homework submission on Moodle a copy of your function in a Python file called tournament.py
. Include a brief description of how the strategy works in comments at the top. - The form of the submitted strategy should be a function that takes three arguments: the player’s own history list and history lists for each of the other two players. The function should return either a "c"
or a "d"
for cooperate or defect. - We reserve the right to disqualify any entries that violate the spirit of the prisoner’s dilemma game (e.g., by changing an opponent’s history). - We strongly suggest that you try out your function before submitting it.
The tournament will be a complete one, that is, every strategy plays against every other pair. Each game will consist of 100 rounds. If there are at least three submissions, the results of the tournament will be posted, so proper bragging rights can be observed.
Make sure you test your code before you submit! Just because your program runs without causing an error does not mean it is correct. Most of the points for this lab (and our labs in general) depend on your program matching the expected behavior exactly.
Below is a description of how to control which strategies get used when you run prisoner2.py
or prisoner3.py
, as well as how to get extra information about what each strategy did each round. You should use this extra information! Think about what you expect a strategy to do, then go through round-by-round and double-check it behaves accordingly. If you skip this testing step, you risk losing a lot of points on the lab without realizing it.
To run your two-player prisoner’s dilemma simulations, use the provides variables strategy1
and strategy2
to set the two strategies to test. For example,
# SELECT STRATEGIES HERE
= chaos
strategy1 = judge
strategy2 = False log
will simulate the chaos
strategy against the judge
strategy. You will see each strategy’s average score printed out in the terminal when you run prisoner2.py
. Similarly, use the provided variables strategy1
, strategy2
, and strategy3
in prisoner3.py
to set the three strategies to simulate.
Just looking at the average scores isn’t necessarily sufficient to tell whether you’ve implemented the strategies correctly. To help you investigate whether your code is correct, I have included a way to print out the actions of every player during the simulation. Change the log
variable to be True
, and a table of all the actions taken will be printed after the average scores. Examining this table carefully is a great way to figure out if your strategies are behaving the way you expect.
Submit the following files via the Lab 1 Moodle page. Remember that each file should have comments with your name, the date, and a description at the top. Note: you do not need to submit history.py
.
A file called prisoner2.py
with your solutions to Problems 1 through 3 (or 4). You should be adding on to the provided prisoner2.py
file, so your submission should also contain unchanged all the code originally in the provided file.
A file called prisoner3.py
with your solutions to Problems 5 through 6 (or 7). You should be adding on to the provided prisoner3.py
file, so your submission should also contain unchanged all the code originally in the provided file.
OPTIONALLY a file called tournament.py
with a single function that is your entry in the three-player prisoner’s dilemma tournament. There should also be a description of your strategy in comments.
This assignment is graded out of 30 points as follows:
friend
– 1 pointchaos
– 3 pointsmimic
– 3 pointsjudge
– 3 pointscareful_mimic
– 3 pointsfriend3
– 2 pointsbetrayer3
– 2 pointstough_mimic
– 4 pointssoft_mimic
– 4 points=
and operators (>
, or
, etc.) – 1 point.py
file – 0.5 pointsimport
statements together at the top (after top comments) – 0.5 pointsalternating
– 1.5 pointscombined
– 1.5 pointsCorrectness points are assessed by an auto-grading script that tests your code for correct behavior. If you submit code that crashes the autograder (because your code crashes or you used different function names than those specified in this writeup), we will attempt to fix any issues and re-grade. You will lose points for each issue we need to fix.
Style points are assessed by us reading your code.