CS 111 Lab 1

Aaron Bauer

September 21, 2021

CS 111 f21 — Lab 1: Prisoner’s Dilemma

Assigned: Wednesday, September 22
Due: 9pm Wednesday, September 29
Collaboration: This is an individual assignment.

Post questions to the Moodle Forum!

Purpose

This lab will give you practice with functions and conditionals in the context of simulating a classic game theory problem, The Prisoner’s Dilemma.

Logistics

You will need to download the following files to do this lab: prisoner2.py, prisoner3.py, and history.py. Your solutions to Problems 1–3 (optionally 4) should go in prisoner2.py and your solutions to Problems 5–6 (optionally 7) should go in prisoner3.py. You will not need to modify history.py at all. You will need to download it into the same folder as prisoner2.py and prisoner3.py. When you need to get information about a player’s history, refer to the documentation for the History object. All the operations you will need are described there. Python takes care of providing the self parameter for you (e.g., get_length is called with no arguments and get_recent_coop is called with one argument).

Suggested timeline:

complete Problem 1 by class on Friday
complete Problems 2 and 3 by class on Monday
use the remaining time to complete Problems 5 and 6

Challenges

The problems marked (CHALLENGE) are a way to go above and beyond what is expected for a lab. Only attempt them after you are completely finished with all other parts of the lab. Completing all the non-challenge components of a lab will earn 28.5/30 (95%, an A). Earning challenge points can result in a grade higher than 100%.

Problems marked (OPTIONAL) are just for fun! :)

Prisoner’s Dilemma

Let’s begin with a story.

Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated the prisoners, visit each of them to offer the same deal. If one testifies for the prosecution against the other (defects) and the other remains silent (cooperates), the defector goes free and the silent accomplice receives the full one-year sentence. If both remain silent, both prisoners are sentenced to only one month in jail for a minor charge. If each betrays the other, each receives a three-month sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?

This story is an example of a classic game theoretic problem called the prisoner’s dilemma. A prisoner’s dilemma always involves two game players, and each has a choice between cooperating and defecting. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. Before going any further, we’ll define some game theory notation.

In game theory, we differentiate between a game, and a play. A game refers to the set of possible choices and outcomes for the entire range of situations. A play refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a two-person binary-choice game is represented by a two-by-two matrix. Here is a hypothetical game matrix (which, by the way, does not represent a prisoner’s dilemma).

	B cooperates	B defects
A cooperates	A gets 5 B gets 5	A gets 2 B gets 3
A defects	A gets 3 B gets 2	A gets 1 B gets 1

The two players in this case are called A and B, and the choices are called cooperate and defect. Players A and B can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, they announce it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player A chooses to cooperate while Player B defects, then A gets 2 points and B gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player A knows before the game even starts that if they and B both choose to defect, they will each get 1 point.

In an iterated game, the two players play repeatedly; thus after finishing one game, A and B may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a play, which constitutes a single round of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, A and B play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both A and B are able to study the results of the previous nine rounds.

An Analysis of a Simple Game Matrix

The game depicted by the matrix above is a particularly easy one to analyze. Let’s examine the situation from Player A’s point of view (Player B’s point of view is identical):

“Suppose B cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose B defects. I still do better by cooperating (since I get two points instead of one). So no matter what B does, I am better off cooperating.”

Player B will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both A and B have a dominant choice–i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. As noted, the matrix shown above does not represent a prisoner’s dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We’ll see an example of a prisoner’s dilemma game very shortly.

To re-cap: in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round.

Now consider the following game matrix:

	B cooperates	B defects
A cooperates	A gets 2 B gets 2	A gets -5 B gets 3
A defects	A gets 3 B gets -5	A gets -1 B gets -1

In this case, Players A and B both have a dominant choice–namely, defection. No matter what Player B does, Player A will generally improve his own score (or at worst lose a little bit) by defecting, and vice versa.

However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of losing one point each, they could win two points each. So the “rational” choice of mutual defection has a puzzling self-destructive flavor.

The second matrix is an example of a prisoner’s dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner’s dilemma situation is characterized by the following conditions: \[DC > CC > DD > CD\] \[CC > \frac{DC + CD}{2}\] Note how for a given opponent’s choice, the player under consideration is always better off defecting. Hence, we see defection is the dominant choice. In the second game matrix we have: \[DC = 5,\,CC = 2,\,DD = -1,\,CD = -3\] so both conditions are met. In the story at the top, you can verify that: \[DC = 0,\,CC = -1,\,DD = -3,\,CD = -12\] Again, these values satisfy the prisoner’s dilemma conditions.

The Two-Player Prisoner’s Dilemma Program

On this homework you will be exploring possible strategies for both two- and three-player prisoner’s dilemma. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner’s dilemma game. Here are some examples:

Betrayer: a program using the betrayer strategy simply defects on every round of every game.
Friend: a program using the friend strategy cooperates on every round of every game.
Chaos: this program cooperates or defects randomly, with an equal chance of each.
Judge: this program cooperates on the first round. On all subsequent rounds, judge examines the history of the other player’s actions, counting the total number of defections and cooperations by the other player. If the other player’s defections outnumber their cooperations, judge will defect; otherwise this strategy will cooperate.
Mimic: this program cooperates on the first round, and then on every subsequent round it mimics the other player’s previous move. Thus, if the other player cooperates (defects) in round \(n\), then mimic will cooperate (defect) in round \(n + 1\).

All of these strategies are somewhat simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. In a computer tournament held in the late 1970’s by the political scientist Robert Axelrod, the first-prize program employed the mimic strategy, and achieved the highest average score in a field of far more complicated programs.

A Python program for an iterated prisoner’s dilemma game is provided in prisoner2.py as part of the code for this project. The function play_loop pits two players (or, to be more precise, two strategies) against one another for 100 games, then prints out the average score of each player.

Player strategies are represented as functions. Each strategy takes two arguments–its own history (that is, a representation of all its previous plays) and its opponent’s history. The strategy function returns either the string "c" for cooperate or the string "d" for defect.

At the beginning of an iterated game, each history is empty. As the game progresses, the histories grow to represent the sequence of plays ("c"s and "d"s) from least recent to most recent. Note each strategy must have its own history as its first argument. So in play_loop, strat0 has history0 as its first argument, and strat1 has history1 as its first argument. Each history is represented by a History object. We’ll see how to define our own objects towards the end of the term, but for now we’ll focus on using them. The possible operations on a History object are described here. These operations are functions associated with a particular object, and are called methods.

As an example, consider the get_most_recent method. The description says it Returns the most recent action. Remember that actions are either "c" for cooperate or "d" for defect. Thus, it enables you to check our opponent’s most recent action, which will be useful for something like the mimic strategy. Note that it has one parameter, self. In Python, all methods have this self parameter. All you need to know now is that this parameter is automatically supplied and not something you provide when calling the method. So to get the most recent action of a History stored in the variable other_history, you would do something like

last_action = other_history.get_most_recent()

The values from the game matrix are stored in variable named rubric using something called a dictionary (more about these later in the term). print_results uses it to calculate the scores at the end of the iterated game. The code starting with the line

if __name__ == "__main__":

takes care of doing the simulation when you run prisoner2.py as described in the Testing Advice section.

Problem 1

The betrayer strategy is already implemented in prisoner2.py. Implement the other two strategies that don’t consider their opponent: friend and chaos.

To achieve the random behavior of chaos, you will need to use Python’s random module (put import random at the top of your file). It provides several different ways of generating random outcomes. I recommend using one of these two:

random.random() will return a random decimal number between 0 and 1.0. Since every value within that range is equally likely, we can combine the returned value with an if statement to give part of our code a certain percentage chance of happening. Specifically, we compare the random value r to a decimal number representing the target percentage. For example, this code has a 30% chance of printing "hotdog" and a 70% chance of printing "salad" (70% because that’s the chance a 30% event doesn’t happen):

r = random.random()
if r < 0.3:
    print("hotdog")
else:
    print("salad")

random.randint(start, end) returns a random integer between start and end, including start and end among the possibilities. You might combine random.randint(0, 1) and an if statement to perform a sort of coin flip.

Full documentation for the random module is here.

Problem 2

Implement the mimic and judge strategies described above. As these strategies react to their opponents, you will need to make use of the operations on the opponent’s History object. Refer to the documentation described here.

Problem 3

Write a new strategy careful_mimic. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. If fewer than two rounds have been played, it should cooperate.

(CHALLENGE) Problem 4

Write a function alternating which alternates between mimic and judge. That is, on the first play, it should act like mimic. On the second play, it should act like judge. On the third play, it should act like mimic again, and so on. You should not copy-paste or re-implement mimic and judge within alternating. Instead, call those functions from within alternating to get what their action would be.

The Three-Player Prisoner’s Dilemma

So far, all of our prisoner’s dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner’s dilemma has focused on two-player games). But it is possible to create a prisoner’s dilemma game that involves three–or even more–players.

Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does mimic mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as mimic is in the two-player game?

Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents.

Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary.

Now we are ready to discuss the payoffs for the three-player game. We impose three rules (actually, there is no universal definition for the multi-player prisoner’s dilemma. The constraints used here represent one possible version of the three-player prisoner’s dilemma):

Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints: \[DCC > CCC\] \[DDD > CDD\] \[DCD > CCD\]
A player should always be better off if more of his opponents choose to cooperate. This rule gives: \[DCC > DCD > DDD\] \[CCC > CCD > CDD\]
If one player’s choice is fixed, the other two players should be left in a two-player prisoner’s dilemma. This rule gives the following constraints: \[CCD > DDD\] \[CCC > DCD\] \[CCD > \frac{CDD + DCD}{2}\] \[CCC > \frac{CCD + DCC}{2}\]

We can satisfy all of these constraints with the following payoffs: \[CDD=-5,\,DDD=-2,\,CCD=0,\,DCD=2,\,CCC=3,\,DCC=5\]

We have provided Python code for a three-player iterated game in prisoner3.py. It has a play_loop function that takes three strategies as input, keeps track of three histories, and prints out results for three players.

Problem 5

Write strategies betrayer3, friend3, and chaos3 that will work in a three-player game. They will need to have three parameters instead of two (their own history and the histories of their two opponents). Try them out to make sure your code is working.

Problem 6

Write two new strategies: tough_mimic and soft_mimic. tough_mimic should defect if either of the opponents defected on the previous round. soft_mimic should defect only if both opponents defected on the previous round. These will need History operations you haven’t needed up until now.

(CHALLENGE) Problem 7

Write a function combined which combines the two-player strategies mimic and judge into a three-player strategy. combined should play mimic against one of the opponents, and judge against the other opponent. Playing a strategy against an opponent just means calling the corresponding function with the appropriate Historys. It will return the action that strategy would take against that opponent. This leaves you with two possible actions, returned by either mimic or judge. Combine them into a single action using this simple rule: if either defected, combined defects. Otherwise, combined cooperates.

You don’t need to re-implement mimic or judge in prisoner3.py, just import the functions you already wrote by including from prisoner2 import mimic, judge along with the other imports.

(OPTIONAL) Problem 8: The Three-Player Prisoner’s Dilemma Tournament

As described earlier, Axelrod held a computer tournament to investigate the two-player prisoner’s dilemma. We are going to hold a three-player tournament. You get to design a strategy for the tournament. You might submit one of the strategies developed in the project, or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. You need to: - Include in your homework submission on Moodle a copy of your function in a Python file called tournament.py. Include a brief description of how the strategy works in comments at the top. - The form of the submitted strategy should be a function that takes three arguments: the player’s own history list and history lists for each of the other two players. The function should return either a "c" or a "d" for cooperate or defect. - We reserve the right to disqualify any entries that violate the spirit of the prisoner’s dilemma game (e.g., by changing an opponent’s history). - We strongly suggest that you try out your function before submitting it.

The tournament will be a complete one, that is, every strategy plays against every other pair. Each game will consist of 100 rounds. If there are at least three submissions, the results of the tournament will be posted, so proper bragging rights can be observed.

Testing Advice

Testing

Make sure you test your code before you submit! Just because your program runs without causing an error does not mean it is correct. Most of the points for this lab (and our labs in general) depend on your program matching the expected behavior exactly.

Below is a description of how to control which strategies get used when you run prisoner2.py or prisoner3.py, as well as how to get extra information about what each strategy did each round. You should use this extra information! Think about what you expect a strategy to do, then go through round-by-round and double-check it behaves accordingly. If you skip this testing step, you risk losing a lot of points on the lab without realizing it.

To run your two-player prisoner’s dilemma simulations, use the provides variables strategy1 and strategy2 to set the two strategies to test. For example,

# SELECT STRATEGIES HERE
strategy1 = chaos
strategy2 = judge
log = False

will simulate the chaos strategy against the judge strategy. You will see each strategy’s average score printed out in the terminal when you run prisoner2.py. Similarly, use the provided variables strategy1, strategy2, and strategy3 in prisoner3.py to set the three strategies to simulate.

Just looking at the average scores isn’t necessarily sufficient to tell whether you’ve implemented the strategies correctly. To help you investigate whether your code is correct, I have included a way to print out the actions of every player during the simulation. Change the log variable to be True, and a table of all the actions taken will be printed after the average scores. Examining this table carefully is a great way to figure out if your strategies are behaving the way you expect.

What to Turn In

Submit the following files via the Lab 1 Moodle page. Remember that each file should have comments with your name, the date, and a description at the top. Note: you do not need to submit history.py.

A file called prisoner2.py with your solutions to Problems 1 through 3 (or 4). You should be adding on to the provided prisoner2.py file, so your submission should also contain unchanged all the code originally in the provided file.
A file called prisoner3.py with your solutions to Problems 5 through 6 (or 7). You should be adding on to the provided prisoner3.py file, so your submission should also contain unchanged all the code originally in the provided file.
OPTIONALLY a file called tournament.py with a single function that is your entry in the three-player prisoner’s dilemma tournament. There should also be a description of your strategy in comments.

Grading

This assignment is graded out of 30 points as follows:

Correctness – 25 points
- friend – 1 point
- chaos – 3 points
- mimic – 3 points
- judge – 3 points
- careful_mimic – 3 points
- friend3 – 2 points
- betrayer3 – 2 points
- tough_mimic – 4 points
- soft_mimic – 4 points
Style – 3.5 points
- Good coding style including putting spaces around = and operators (>, or, etc.) – 1 point
- Blank lines between function definitions – 0.5 points
- Descriptive variable names – 1 point
- Comments with name, date, and purpose at the top of each .py file – 0.5 points
- All import statements together at the top (after top comments) – 0.5 points
Challenges – up to 3 points
- alternating – 1.5 points
- combined – 1.5 points

Correctness points are assessed by an auto-grading script that tests your code for correct behavior. If you submit code that crashes the autograder (because your code crashes or you used different function names than those specified in this writeup), we will attempt to fix any issues and re-grade. You will lose points for each issue we need to fix.

Style points are assessed by us reading your code.