**CS 111 f19 - Homework 2 (Pair Programming)** *Due: Friday, October 4, at 9pm* You will need to download the following files to do this homework: [`prisoner2.py`](prisoner2.py), [`prisoner3.py`](prisoner3.py), and [`history.py`](history.py). Your solutions to Problems 1–4 should go in `prisoner2.py` and your solutions to Problems 5–7 should go in `prisoner3.py`. You will not need to modify `history.py` at all, but it must be in the same directory as `prisoner2.py` and `prisoner3.py` for them to work. Post questions to the [Homework 2 forum](https://moodle.carleton.edu/mod/forum/view.php?id=486426)! --- # Prisoner's Dilemma Let's begin with a story. >Two suspects are arrested by the police. The police have insufficient >evidence for a conviction, and, having separated the prisoners, visit each of >them to offer the same deal. If one testifies for the prosecution against the >other (defects) and the other remains silent (cooperates), the defector goes >free and the silent accomplice receives the full one-year sentence. If both >remain silent, both prisoners are sentenced to only one month in jail for a >minor charge. If each betrays the other, each receives a three-month >sentence. Each prisoner must choose to betray the other or to remain silent. >Each one is assured that the other would not know about the betrayal before >the end of the investigation. How should the prisoners act? This story is an example of a classic game theoretic problem called the *prisoner's dilemma*. A prisoner's dilemma always involves two *game players*, and each has a choice between *cooperating* and *defecting*. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. Before going any further, we'll define some game theory notation. In game theory, we differentiate between a *game*, and a *play*. A *game* refers to the set of possible choices and outcomes for the entire range of situations. A *play* refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a *two-person binary-choice game* is represented by a two-by-two matrix. Here is a hypothetical game matrix (which, by the way, does not represent a prisoner's dilemma). | **B** cooperates | **B** defects ---------------- | :--------------: | :-----------: **A** cooperates | **A** gets 5
**B** gets 5 | **A** gets 2
**B** gets 3 **A** defects | **A** gets 3
**B** gets 2 | **A** gets 1
**B** gets 1 The two players in this case are called **A** and **B**, and the choices are called *cooperate* and *defect*. Players **A** and **B** can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, they announce it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player **A** chooses to cooperate while Player **B** defects, then **A** gets 2 points and **B** gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player **A** knows before the game even starts that if they and **B** both choose to defect, they will each get 1 point. In an *iterated game*, the two players play repeatedly; thus after finishing one game, **A** and **B** may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a *play*, which constitutes a single *round* of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, **A** and **B** play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both **A** and **B** are able to study the results of the previous nine rounds. ## An Analysis of a Simple Game Matrix The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player **A**'s point of view (Player **B**'s point of view is identical): >"Suppose **B** cooperates. Then I do better by cooperating myself (I receive >five points instead of three). On the other hand, suppose **B** defects. I >still do better by cooperating (since I get two points instead of one). So no >matter what **B** does, I am better off cooperating." Player **B** will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both **A** and **B** have a dominant choice--i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. As noted, the matrix shown above does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly. **To re-cap:** in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round. Now consider the following game matrix: | **B** cooperates | **B** defects ---------------- | :--------------: | :-----------: **A** cooperates | **A** gets 2
**B** gets 2 | **A** gets -5
**B** gets 3 **A** defects | **A** gets 3
**B** gets -5 | **A** gets -1
**B** gets -1 In this case, Players **A** and **B** both have a dominant choice--namely, defection. No matter what Player **B** does, Player **A** will generally improve his own score (or at worst lose a little bit) by defecting, and vice versa. However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of losing one point each, they could win two points each. So the "rational" choice of mutual defection has a puzzling self-destructive flavor. The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions: \begin{equation*} DC > CC > DD > CD \end{equation*} \begin{equation*} CC > \frac{DC + CD}{2} \end{equation*} Note how for a given opponent's choice, the player under consideration is always better off defecting. Hence, we see defection is the dominant choice. In the second game matrix we have: \begin{equation*} DC = 5,\,CC = 2,\,DD = -1,\,CD = -3 \end{equation*} so both conditions are met. In the story at the top, you can verify that: \begin{equation*} DC = 0,\,CC = -1,\,DD = -3,\,CD = -12 \end{equation*} Again, these values satisfy the prisoner's dilemma conditions. ## The Two-Player Prisoner's Dilemma Program On this homework you will be exploring possible strategies for both two- and three-player prisoner's dilemma. This will involve both coding these strategies and then describing their behavior. Hence, **you will turn in both your code and a document with your descriptions**. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples: + **Betrayer**: a program using the **betrayer** strategy simply defects on every round of every game. + **Friend**: a program using the **friend** strategy cooperates on every round of every game. + **Chaos**: this program cooperates or defects randomly, with an equal chance of each. + **Judge**: this program cooperates on the first round. On all subsequent rounds, **judge** examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber their cooperations, **judge** will defect; otherwise this strategy will cooperate. + **Mimic**: this program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) in round $n$, then **mimic** will cooperate (defect) in round $n + 1$. All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. In a computer tournament held in the late 1970's by the political scientist Robert Axelrod, the first-prize program employed the **mimic** strategy, and achieved the highest average score in a field of far more complicated programs. A Python program for an iterated prisoner's dilemma game is provided in [`prisoner2.py`](prisoner2.py) as part of the code for this project. The function `play_loop` pits two players (or, to be more precise, two *strategies*) against one another for 100 games, then prints out the average score of each player. Player strategies are represented as functions. Each strategy takes two arguments--its own *history* (that is, a representation of all its previous *plays*) and its opponent's *history*. The strategy function returns either the string `"c"` for *cooperate* or the string `"d"` for *defect*. At the beginning of an iterated game, each history is empty. As the game progresses, the histories grow to represent the sequence of plays (`"c"`s and `"d"`s) from least recent to most recent. Note each strategy must have its own history as its first argument. So in `play_loop`, `strat0` has `history0` as its first argument, and `strat1` has `history1` as its first argument. Each history is represented by a `History` object. We'll see how to define our own objects towards the end of the term, but for now we'll focus on using them. The seven possible operations on a `History` object are described [here](history_doc/history.html). These operations are functions associated with a particular object, and are called *methods*. As an example, consider the [`get_most_recent`](history_doc/history.html#history.History.get_most_recent) method. The description says it *Returns the most recent action*. Remember that actions are either `"c"` for cooperate or `"d"` for defect. Thus, it enables you to check our opponent's most recent action, which will be useful for something like the **mimic** strategy. Note that it has one parameter, `self`. In Python, all methods have this `self` parameter. All you need to know now is that this parameter is automatically supplied and not something you provide when calling the method. So to get the most recent action of a `History` stored in the variable `other_history`, you would do something like ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python last_action = other_history.get_most_recent() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The values from the game matrix are stored in variable named `rubric` using something called a *dictionary* (more about these later in the term). `print_results` uses it to calculate the scores at the end of the iterated game. ### Problem 1 The `betrayer` strategy is already implemented in `prisoner2.py`. Implement the other four strategies described above: `friend`, `chaos`, `judge`, and `mimic`. Use `play_loop` to play games among the five defined strategies. Notice how a strategy’s performance varies sharply depending on its opponent. For example, `friend` does quite well against `mimic` or against another `friend`, but it loses badly to `betrayer`. Pay special attention to `mimic`. Notice how it never beats its opponent--but it never loses badly. Create a 25-element table in which you show the average score for tournaments pitting all possible pairings of the five different strategies: `betrayer`, `friend`, `mimic`, `chaos`, `judge`. You can use a spreadsheet like Excel or Google Sheets to make the table or just do it in a plain text file. Describe the behavior you observe for the different strategies. ### Problem 2 Write a new strategy `careful_mimic`. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. If fewer than two rounds have been played, it should cooperate. Play `careful_mimic` against `betrayer`, `mimic`, and `judge`. Describe the behavior you observe. ### Problem 3 Write a function `make_alternating_strategy` which takes as input two strategies (say, `strat0` and `strat1`). `make_alternating_strategy` should return a strategy which alternates between `strat0` and `strat1`. Use this function to make two new strategies (your choice as to what existing strategies these new ones alternate between). Test them against `betrayer`, `mimic`, and `judge` and describe the performance. ### Problem 4 Write a function `gentle`, which takes as input a strategy (say `strat`) and a number between 0 and 1 (call it `gentleness_factor`). The `gentle` function should return a strategy that plays the same as `strat` except: when `strat` defects, the new strategy should have a `gentleness_factor` chance of cooperating. (If `gentleness_factor` is 0, the returned strategy performs exactly the same as `strat`; if `gentleness_factor` is 0.5, the returned strategy cooperates half the time that `strat` defects; if `gentleness_factor` is 1, the returned strategy performs the same as `friend`.) Use `gentle` with a low value for `gentleness_factor` - say, 0.1 - to create two new strategies: `slightly_gentle_betrayer` and `slightly_gentle_mimic`. Test them against `betrayer`, `mimic`, and `judge` and describe the results. ## The Three-Player Prisoner’s Dilemma So far, all of our prisoner's dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner's dilemma has focused on two-player games). But it is possible to create a prisoner's dilemma game that involves three--or even more--players. Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does `mimic` mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as `mimic` is in the two-player game? Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents. Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary. Now we are ready to discuss the payoffs for the three-player game. We impose three rules (actually, there is no universal definition for the multi-player prisoner's dilemma. The constraints used here represent one possible version of the three-player prisoner's dilemma): 1. Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints: \[DCC > CCC\] \[DDD > CDD\] \[DCD > CCD\] 2. A player should always be better off if more of his opponents choose to cooperate. This rule gives: \[DCC > DCD > DDD\] \[CCC > CCD > CDD\] 3. If one player's choice is fixed, the other two players should be left in a two-player prisoner's dilemma. This rule gives the following constraints: \[CCD > DDD\] \[CCC > DCD\] \[CCD > \frac{CDD + DCD}{2}\] \[CCC > \frac{CCD + DCC}{2}\] We can satisfy all of these constraints with the following payoffs: \[CDD=-5,\,DDD=-2,\,CCD=0,\,DCD=2,\,CCC=3,\,DCC=5\] We have provided Python code for a three-player iterated game in [`prisoner3.py`](prisoner3.py). It has a `play_loop` function that takes three strategies as input, keeps track of three histories, and prints out results for three players. ### Problem 6 Write strategies `betrayer3`, `friend3`, and `chaos3` that will work in a three-player game. Try them out to make sure your code is working. Write two new strategies: `tough_mimic` and `soft_mimic`. `tough_mimic` should defect if either of the opponents defected on the previous round. `soft_mimic` should defect only if both opponents defected on the previous round. Play `tough_mimic`, `soft_mimic`, and `chaos3` against each other. Describe the observed behavior of these strategies. ### Problem 7 Write a function `make_combined_strategy` which takes as input two two-player strategies and a *combining* function. `make_combined_strategy` should return a three-player strategy that plays one of the two-player strategies against one of the opponents, and the other two-player strategy against the other opponent, then calls the *combining* function on the two two-player results. The *combining* function should take two prisoner's dilemma choices and return a single choice. Here's an example: this call to `make_combined_strategy` returns a strategy equivalent to `tough_mimic` in Problem 6. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python from prisoner2 import mimic def tough_combine(choice0, choice1): if choice0 == "d" or choice1 == "d": return "d" return "c" new_strat = make_combined_strategy(mimic, mimic, tough_combine) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The resulting strategy plays mimic against each opponent, and then calls the combining function on the two results. If either of the two two-player strategies has returned `"d"`, then the three-player strategy will also return `"d"`. You can use this example and verify that `new_strat` has the same behavior as `tough_mimic` to check your solution. ### Problem 8 (OPTIONAL): The Three-Player Prisoner's Dilemma Tournament As described earlier, Axelrod held a computer tournament to investigate the two-player prisoner's dilemma. We are going to hold a three-player tournament. You get to design a strategy for the tournament. You might submit one of the strategies developed in the project, or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. You need to: - Include in your homework submission on Moodle a copy of your function in a Python file called `tournament.py`. Include a brief description of how the strategy works in comments at the top. - The form of the submitted strategy should be a function that takes three arguments: the player's own history list and history lists for each of the other two players. The function should return either a `"c"` or a `"d"` for cooperate or defect. - We reserve the right to disqualify any entries that violate the spirit of the prisoner's dilemma game (e.g., by changing an opponent's history). - We strongly suggest that you try out your function before submitting it. The tournament will be a complete one, that is, every strategy plays against every other pair. Each game will consist of 100 rounds. If there are at least three submissions, the results of the tournament will be posted, so proper bragging rights can be observed. # Advice While completing this homework doesn't require a large amount of code, it can be conceptually challenging. It's in your best interest to get together with your partner early, so you have time to think carefully and ask questions. As a guideline, you should try and finish problems 1-4 be Monday or Tuesday next week. You will need to download [`history.py`](history.py) into the same folder as `prisoner2.py` and `prisoner3.py`. When you need to get information about a player's history, refer to the [documentation for the `History` object](history_doc/history.html). All the operations you will need are described there. Remember that Python takes care of providing the `self` parameter for you (e.g., `get_length` is called with no arguments and `get_recent_coop` is called with one argument). To get a random decimal number between 0 and 1.0, use [`random.random()`](https://docs.python.org/3.7/library/random.html#random.random). Remember to put `import random` at the top of your file. Full documentation for the `random` module is [here](https://docs.python.org/3.7/library/random.html) Functions are data like anything else in Python. They can be assigned to variables and given as arguments to other functions. Problems 3, 4, and 7 ask you to write functions that take strategies (i.e., other functions) as input. If you have a function assigned to the variable `strat0`, you can call that function like you would any other: `strat0(my_history, other_history)`. The functions you write for these problems also return strategies in the form of new functions. To do this, you can actually define a function within a function. For example, the `make_pow_fn` below takes a parameter `n` and returns a function that takes a number and returns that number to the `n`th power. Note how the function returned by `make_pow_fn` **remembers** the value of `n` it was create with. This is an powerful feature of Python and other programming languages that let's us create functions that create more functions. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3(2), pow4(2)) # prints 8 16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When `play_loop` prints out the average scores of the competing strategies, it uses `strat0.__name__` (that's `strat0.` followed by two underscores (`_`), then `name`, then two more underscores) and `strat1.__name__` (and `strat2.__name__` in the three-player version) to label these scores. These names will probably not be very informative for strategies created by `make_alternating_strategy`, `gentle`, and `make_combined_strategy`. To fix this, you can actually modify the `.__name__` property. Continuing with the `make_pow_fn` example, we can see that the functions it creates all have the same `.__name__` value: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3.__name__) # prints pow_fn print(pow4.__name__) # also prints pow_fn ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ but we can give them different names like this: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n pow_fn.__name__ = "pow" + str(n) return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3.__name__) # prints pow3 print(pow4.__name__) # prints pow4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that to combine the string `"pow"` and the integer `n` into a single piece of text, we had to first convert `n` to a string using the `str` function. Then we could use the plus operator (`+`) to concatenate the two strings (i.e., smush them together into a single new string) and assign the result to `pow_fn.__name__`. # What to Turn In Submit the following files via the [Moodle Homework 2 page](https://moodle.carleton.edu/mod/assign/view.php?id=486427). Remember that each file should have comments with your name, the date, and a description at the top. If you are working as a pair, please only one person submit. Note: you do **not** need to submit `history.py`. - A file called `prisoner2.py` with your solutions to Problems 1 through 4. You should be adding on to the provided [`prisoner2.py`](prisoner2.py) file, so your submission should also contain unchanged all the code originally in the provided file. - A file called `prisoner3.py` with your solutions to Problems 5 through 7. You should be adding on to the provided [`prisoner3.py`](prisoner3.py) file, so your submission should also contain unchanged all the code originally in the provided file. - A report with your descriptions of the behavior of the different strategies. This can include a spreadsheet. - OPTIONALLY a file called `tournament.py` with a single function that is your entry in the three-player prisoner's dilemma tournament. There should also be a description of your strategy in comments. - A file called `feedback.txt` with the following information (you get points for answering these questions!): - How many hours you spent outside of class on this homework. - The difficulty of this homework for this point in a 100-level course as: too easy, easy, moderate, challenging, or too hard. - What did you learn on this homework (very briefly)? Rate the educational value relative to the time invested from 1 (low) to 5 (high).