**CS 111 w20 --- Lab 1: Prisoner's Dilemma**
*Due: Monday, January 20, at 9pm*
(##) Purpose
This lab will give you practice with functions and conditionals in the context of simulating a classic game theory problem, [The Prisoner's Dilemma](https://en.wikipedia.org/wiki/Prisoner%27s_dilemma).
(##) Logistics
You will need to download the following files to do this lab: [`prisoner2.py`](prisoner2.py), [`prisoner3.py`](prisoner3.py), and [`history.py`](history.py). Your solutions to Problems 1–3 (optionally 4) should go in `prisoner2.py` and your solutions to Problems 5–6 (optionally 7) should go in `prisoner3.py`. You will not need to modify `history.py` at all. You will need to download it into the same folder as `prisoner2.py` and `prisoner3.py`. When you need to get information about a player's history, refer to the [documentation for the `History` object](history_doc/history.html). All the operations you will need are described there. Python takes care of providing the `self` parameter for you (e.g., `get_length` is called with no arguments and `get_recent_coop` is called with one argument).
**Suggested timeline**:
- complete Problems 1 and 2 by class on Wednesday
- complete Problems 3 by class on Friday
- use the remaining time to complete Problems 5 and 6
The problems marked (OPTIONAL CHALLENGE) are optional, graded problems. Each will add two points to your lab grade if successfully completed, though they cannot raise your grade above the maximum 30 points. Problems marked (OPTIONAL) are just for fun! :)
Track how much time you spend on this homework outside class. You’re required
to include this in [what you turn in](#whattoturnin).
**Post questions to the [Moodle Forum](https://moodle.carleton.edu/mod/forum/view.php?id=507509)!**
---
# Prisoner's Dilemma
Let's begin with a story.
>Two suspects are arrested by the police. The police have insufficient
>evidence for a conviction, and, having separated the prisoners, visit each of
>them to offer the same deal. If one testifies for the prosecution against the
>other (defects) and the other remains silent (cooperates), the defector goes
>free and the silent accomplice receives the full one-year sentence. If both
>remain silent, both prisoners are sentenced to only one month in jail for a
>minor charge. If each betrays the other, each receives a three-month
>sentence. Each prisoner must choose to betray the other or to remain silent.
>Each one is assured that the other would not know about the betrayal before
>the end of the investigation. How should the prisoners act?
This story is an example of a classic game theoretic problem called the *prisoner's dilemma*. A prisoner's dilemma always involves two *game players*, and each has a choice between *cooperating* and *defecting*. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. Before going any further, we'll define some game theory notation.
In game theory, we differentiate between a *game*, and a *play*. A *game* refers to the set of possible choices and outcomes for the entire range of situations. A *play* refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a *two-person binary-choice game* is represented by a two-by-two matrix. Here is a hypothetical game matrix (which, by the way, does not represent a prisoner's dilemma).
| **B** cooperates | **B** defects
---------------- | :--------------: | :-----------:
**A** cooperates | **A** gets 5
**B** gets 5 | **A** gets 2
**B** gets 3
**A** defects | **A** gets 3
**B** gets 2 | **A** gets 1
**B** gets 1
The two players in this case are called **A** and **B**, and the choices are called *cooperate* and *defect*. Players **A** and **B** can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, they announce it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player **A** chooses to cooperate while Player **B** defects, then **A** gets 2 points and **B** gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player **A** knows before the game even starts that if they and **B** both choose to defect, they will each get 1 point.
In an *iterated game*, the two players play repeatedly; thus after finishing one game, **A** and **B** may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a *play*, which constitutes a single *round* of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, **A** and **B** play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both **A** and **B** are able to study the results of the previous nine rounds.
## An Analysis of a Simple Game Matrix
The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player **A**'s point of view (Player **B**'s point of view is identical):
>"Suppose **B** cooperates. Then I do better by cooperating myself (I receive
>five points instead of three). On the other hand, suppose **B** defects. I
>still do better by cooperating (since I get two points instead of one). So no
>matter what **B** does, I am better off cooperating."
Player **B** will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both **A** and **B** have a dominant choice--i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. As noted, the matrix shown above does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly.
**To re-cap:** in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round.
Now consider the following game matrix:
| **B** cooperates | **B** defects
---------------- | :--------------: | :-----------:
**A** cooperates | **A** gets 2
**B** gets 2 | **A** gets -5
**B** gets 3
**A** defects | **A** gets 3
**B** gets -5 | **A** gets -1
**B** gets -1
In this case, Players **A** and **B** both have a dominant choice--namely, defection. No matter what Player **B** does, Player **A** will generally improve his own score (or at worst lose a little bit) by defecting, and vice versa.
However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of losing one point each, they could win two points each. So the "rational" choice of mutual defection has a puzzling self-destructive flavor.
The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions:
\begin{equation*}
DC > CC > DD > CD
\end{equation*}
\begin{equation*}
CC > \frac{DC + CD}{2}
\end{equation*}
Note how for a given opponent's choice, the player under consideration is always better off defecting. Hence, we see defection is the dominant choice. In the second game matrix we have:
\begin{equation*}
DC = 5,\,CC = 2,\,DD = -1,\,CD = -3
\end{equation*}
so both conditions are met. In the story at the top, you can verify that:
\begin{equation*}
DC = 0,\,CC = -1,\,DD = -3,\,CD = -12
\end{equation*}
Again, these values satisfy the prisoner's dilemma conditions.
## The Two-Player Prisoner's Dilemma Program
On this homework you will be exploring possible strategies for both two- and three-player prisoner's dilemma. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples:
+ **Betrayer**: a program using the **betrayer** strategy simply defects on every round of every game.
+ **Friend**: a program using the **friend** strategy cooperates on every round of every game.
+ **Chaos**: this program cooperates or defects randomly, with an equal chance of each.
+ **Judge**: this program cooperates on the first round. On all subsequent rounds, **judge** examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber their cooperations, **judge** will defect; otherwise this strategy will cooperate.
+ **Mimic**: this program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) in round $n$, then **mimic** will cooperate (defect) in round $n + 1$.
All of these strategies are somewhat simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. In a computer tournament held in the late 1970's by the political scientist Robert Axelrod, the first-prize program employed the **mimic** strategy, and achieved the highest average score in a field of far more complicated programs.
A Python program for an iterated prisoner's dilemma game is provided in [`prisoner2.py`](prisoner2.py) as part of the code for this project. The function `play_loop` pits two players (or, to be more precise, two *strategies*) against one another for 100 games, then prints out the average score of each player.
Player strategies are represented as functions. Each strategy takes two arguments--its own *history* (that is, a representation of all its previous *plays*) and its opponent's *history*. The strategy function returns either the string `"c"` for *cooperate* or the string `"d"` for *defect*.
At the beginning of an iterated game, each history is empty. As the game progresses, the histories grow to represent the sequence of plays (`"c"`s and `"d"`s) from least recent to most recent. Note each strategy must have its own history as its first argument. So in `play_loop`, `strat0` has `history0` as its first argument, and `strat1` has `history1` as its first argument. Each history is represented by a `History` object. We'll see how to define our own objects towards the end of the term, but for now we'll focus on using them. The possible operations on a `History` object are described [here](history_doc/history.html). These operations are functions associated with a particular object, and are called *methods*.
As an example, consider the [`get_most_recent`](history_doc/history.html#history.History.get_most_recent) method. The description says it *Returns the most recent action*. Remember that actions are either `"c"` for cooperate or `"d"` for defect. Thus, it enables you to check our opponent's most recent action, which will be useful for something like the **mimic** strategy. Note that it has one parameter, `self`. In Python, all methods have this `self` parameter. All you need to know now is that this parameter is automatically supplied and not something you provide when calling the method. So to get the most recent action of a `History` stored in the variable `other_history`, you would do something like
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python
last_action = other_history.get_most_recent()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The values from the game matrix are stored in variable named `rubric` using something called a *dictionary* (more about these later in the term). `print_results` uses it to calculate the scores at the end of the iterated game. The code starting with the line
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python
if __name__ == "__main__":
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
takes care of doing the simulation when you run `prisoner2.py` as described in the [Testing Advice](#testingadvice) section.
(###) Problem 1
The `betrayer` strategy is already implemented in `prisoner2.py`. Implement the other two strategies that don't consider their opponent: `friend` and `chaos`.
To achieve the random behavior of `chaos`, you will need to use Python's `random` module (put `import random` at the top of your file). It provides several different ways of generating random outcomes. I recommend using one of these two:
+ [`random.random()`](https://docs.python.org/3.7/library/random.html#random.random) will return a random decimal number between 0 and 1.0. Since every value within that range is equally likely, we can combine the returned value with an `if` statement to give part of our code a certain percentage chance of happening. Specifically, we compare the random value `r` to a decimal number representing the target percentage. For example, this code has a 30% chance of printing `"hotdog"` and a 70% chance of printing `"salad"` (70% because that's the chance a 30% event doesn't happen):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python
r = random.random()
if r < 0.3:
print("hotdog")
else:
print("salad")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ [`random.randint(start, end)`](https://docs.python.org/3.7/library/random.html#random.randint) returns a random integer between `start` and `end`, including `start` and `end` among the possibilities. You might combine `random.randint(0, 1)` and an `if` statement to perform a sort of *coin flip*.
Full documentation for the `random` module is [here](https://docs.python.org/3.7/library/random.html).
(###) Problem 2
Implement the `mimic` and `judge` strategies described above. As these strategies react to their opponents, you will need to make use of the operations on the opponent's `History` object. Refer to the documentation described [here](history_doc/history.html).
(###) Problem 3
Write a new strategy `careful_mimic`. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. If fewer than two rounds have been played, it should cooperate.
(###) (OPTIONAL CHALLENGE) Problem 4
Write a function `alternating` which alternates between `mimic` and `judge`. That is, on the first play, it should act like `mimic`. On the second play, it should act like `judge`. On the third play, it should act like `mimic` again, and so on. You should not copy-paste or re-implement `mimic` and `judge` within `alternating`. Instead, call those functions from within `alternating` to get what their action would be.
## The Three-Player Prisoner’s Dilemma
So far, all of our prisoner's dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner's dilemma has focused on two-player games). But it is possible to create a prisoner's dilemma game that involves three--or even more--players.
Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does `mimic` mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as `mimic` is in the two-player game?
Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents.
Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary.
Now we are ready to discuss the payoffs for the three-player game. We impose three rules (actually, there is no universal definition for the multi-player prisoner's dilemma. The constraints used here represent one possible version of the three-player prisoner's dilemma):
1. Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints:
\[DCC > CCC\]
\[DDD > CDD\]
\[DCD > CCD\]
2. A player should always be better off if more of his opponents choose to cooperate. This rule gives:
\[DCC > DCD > DDD\]
\[CCC > CCD > CDD\]
3. If one player's choice is fixed, the other two players should be left in a two-player prisoner's dilemma. This rule gives the following constraints:
\[CCD > DDD\]
\[CCC > DCD\]
\[CCD > \frac{CDD + DCD}{2}\]
\[CCC > \frac{CCD + DCC}{2}\]
We can satisfy all of these constraints with the following payoffs:
\[CDD=-5,\,DDD=-2,\,CCD=0,\,DCD=2,\,CCC=3,\,DCC=5\]
We have provided Python code for a three-player iterated game in [`prisoner3.py`](prisoner3.py). It has a `play_loop` function that takes three strategies as input, keeps track of three histories, and prints out results for three players.
(###) Problem 5
Write strategies `betrayer3`, `friend3`, and `chaos3` that will work in a three-player game. They will need to have three parameters instead of two (their own history and the histories of their two opponents). Try them out to make sure your code is working.
(###) Problem 6
Write two new strategies: `tough_mimic` and `soft_mimic`. `tough_mimic` should defect if either of the opponents defected on the previous round. `soft_mimic` should defect only if both opponents defected on the previous round. These will need [`History` operations](history_doc/history.html) you haven't needed up until now.
(###) (OPTIONAL CHALLENGE) Problem 7
Write a function `combined` which combines the two-player strategies `mimic` and `judge` into a three-player strategy. `combined` should play `mimic` against one of the opponents, and `judge` against the other opponent. *Playing* a strategy against an opponent just means calling the corresponding function with the appropriate `History`s. It will return the action that strategy would take against that opponent. This leaves you with two possible actions, returned by either `mimic` or `judge`. Combine them into a single action using this simple rule: if either defected, `combined` defects. Otherwise, `combined` cooperates.
You don't need to re-implement `mimic` or `judge` in `prisoner3.py`, just import the functions you already wrote by including `from prisoner2 import mimic, judge` along with the other imports.
(###) (OPTIONAL) Problem 8: The Three-Player Prisoner's Dilemma Tournament
As described earlier, Axelrod held a computer tournament to investigate the two-player prisoner's dilemma. We are going to hold a three-player tournament. You get to design a strategy for the tournament. You might submit one of the strategies developed in the project, or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. You need to:
- Include in your homework submission on Moodle a copy of your function in a Python file called `tournament.py`. Include a brief description of how the strategy works in comments at the top.
- The form of the submitted strategy should be a function that takes three arguments: the player's own history list and history lists for each of the other two players. The function should return either a `"c"` or a `"d"` for cooperate or defect.
- We reserve the right to disqualify any entries that violate the spirit of the prisoner's dilemma game (e.g., by changing an opponent's history).
- We strongly suggest that you try out your function before submitting it.
The tournament will be a complete one, that is, every strategy plays against every other pair. Each game will consist of 100 rounds. If there are at least three submissions, the results of the tournament will be posted, so proper bragging rights can be observed.
# Testing Advice
To run your two-player prisoner's dilemma simulations, execute
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ bash
python3 prisoner2.py STRATEGY1 STRATEGY2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
from the terminal. If you're on Windows, use `python` instead of `python3`. Replace `STRATEGY1` and `STRATEGY2` with the strategies you want the two players to use. They must be names of strategy functions in your file, like `betrayer` or `mimic`. In Visual Studio Code, you can get to the terminal by going to the *View* menu and selecting *Terminal*. Assuming your code doesn't have any errors, you should see the average scores of each player printed out.
Three-player simulations are run the same way, via
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ bash
python3 prisoner3.py STRATEGY1 STRATEGY2 STRATEGY3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To help you investigate whether your code is correct, I have included a way to print out the actions of every player during the simulation. Add `--log` onto the end of either of the above commands (with a space between it and the last strategy name), and a table of all the actions taken will be printed after the average scores. Examining this table carefully is a great way to figure out if your strategies are behaving the way you expect.
# What to Turn In
Submit the following files via the [Lab 1 Moodle page](https://moodle.carleton.edu/mod/assign/view.php?id=507492). Remember that each file should have comments with your name, the date, and a description at the top. Note: you do **not** need to submit `history.py`.
- A file called `prisoner2.py` with your solutions to Problems 1 through 3 (or 4). You should be adding on to the provided [`prisoner2.py`](prisoner2.py) file, so your submission should also contain unchanged all the code originally in the provided file.
- A file called `prisoner3.py` with your solutions to Problems 5 through 6 (or 7). You should be adding on to the provided [`prisoner3.py`](prisoner3.py) file, so your submission should also contain unchanged all the code originally in the provided file.
- OPTIONALLY a file called `tournament.py` with a single function that is your entry in the three-player prisoner's dilemma tournament. There should also be a description of your strategy in comments.
- A file called `feedback.txt` with the following information (you get points for answering these questions!):
- How many hours you spent outside of class on this homework.
- The difficulty of this homework for this point in a 100-level course as: too easy, easy, moderate, challenging, or too hard.
- What did you learn on this homework (very briefly)? Rate the educational value relative to the time invested from 1 (low) to 5 (high).
# Grading
This assignment is graded out of 30 points as follows:
- Submitting `feedback.txt` -- 3 points
- Correctness -- 23 points
+ `friend` -- 1 point
+ `chaos` -- 3 points
+ `mimic` -- 3 points
+ `judge` -- 3 points
+ `careful_mimic` -- 3 points
+ `friend3` -- 1 point
+ `betrayer3` -- 1 point
+ `tough_mimic` -- 4 points
+ `soft_mimic` -- 4 points
- Style -- 4 points
+ Good coding style including putting spaces around `=` and operators (`>`, `or`, etc.) -- 1 point
+ Blank lines between function definitions -- 0.5 points
+ Descriptive variable names -- 1 point
+ Comments with name, date, and purpose at the top of each `.py` file -- 1 point
+ All `import` statements together at the top (after top comments) -- 0.5 points
Correctness points are assessed by an auto-grading script that tests your code for correct behavior. If you submit code that crashes the autograder (because your code crashes or you used different function names than those specified in this writeup), we will attempt to fix any issues and re-grade. You will lose points for each issue we need to fix.
Style points are assessed by us reading your code.