Game Theory, the Nash Equilibrium, and the Prisoner’s Dilemma

Douglas E. Hill

Topics

29

Douglas E. Hill

Douglas E. Hill received his bachelor’s degree in Mathematics/Computer Science from the College of Idaho, his master’s in Biomathematics from UCLA, and his doctorate in Philosophy from the Logic and Philosophy of Science department at the University of California, Irvine. He now teaches at Golden West College. This work released under a CC-BY license.

Game theory models human interactions. There are a lot of different ways that humans can interact, so there are a lot of different models. We will call these models games. It will help to start by first looking at two person games. In such a game, you and I are dealing with each other. We will assume that each of us is rationally self-interested; that is, that each of us will act to maximize our outcome, payoff, or payout. Of course, people do not always act in this way. Sometimes people act altruistically, for the benefit of others at some cost to themselves. Nonetheless, this will be a useful simplifying assumption.

One common model is the zero-sum game. In such a game, you win as much as I lose, or I lose as much as you win. This is the logic of the poker table, or a sporting match. Being rationally self-interested, I will try to win, and you should try to win, too. In such a game, our interests are diametrically opposed. Since you can only win when I lose, such a game would not allow for cooperation. Such a game only allows for win-lose or lose-win.

But not all interactions, or games are like this. Some games allow for cooperation. In such a game, it is possible for both of us to benefit. Win-win and lose-lose are possible outcomes. An example would be a coordination game. Suppose you and I would each profit (say one dollar each) if we meet, and that we get nothing if we fail to meet. We might meet at the market, or we might meet in the park. In this coordination game, only a dollar is at stake, so let us raise the stakes. Suppose you and I are drivers, going in the opposite direction on a road. Should you drive on the left or on the right? In such games, our interests are perfectly aligned. If we can coordinate on a meeting place we can both profit; if we can coordinate in the driving game we can both live. We would like to set up our society so that cooperative interactions are the rule rather than the exception. We shall see if this is possible.

The Nash Equilibrium

Consider a set of strategies taken by all the players in a game. If for any set of strategies, no individual player can benefit by changing his individual strategy, then that set of strategies forms a Nash-equilibrium. The term Nash-equilibrium applies to the set of strategies taken by all the players, not to any one player’s individual strategy. If a player can only do worse by deviating then the equilibrium is strict, if she can do just as well (but no better) then then the equilibrium is weak, and if she can do better, then it is not an equilibrium. The Nash-equilibrium is named after John Nash (1928-2015), who proved that that every game has at least one Nash-equilibrium.

Take the case where we are driving on the road in opposite directions. How many equilibria does this game have? One equilibrium is where we each drive on our right. As long as we both stick to this, we will pass each other safely. If either of us deviates, we are liable to suffer injury or death. Since either of us would do strictly worse by deviating from the drive-on-the-right equilibrium, this equilibrium is strict. Similarly there is an equilibrium where we both drive on our left, also a strict equilibrium. But there is also a third equilibrium where we each randomly choose, so that we drive on each side with a 50% chance. Say we each flip a coin, and drive on the right if it comes up heads, and on the left if it comes up tails. In this case, we have a 50% of safely passing each other, and a 50% change of getting into an accident. Is this really an equilibrium? Yes, because if this is what I am doing, you cannot do any better by changing. No matter what you do, you will have a 50% chance of avoiding an accident. Because you cannot do any better by changing, this is an equilibrium too. But because you do not do any worse by changing, this is only a weak equilibrium. But it is an equilibrium nonetheless.

This coordination game shows us a few things about Nash equilibria. There is not always a single equilibrium, and just because something is an equilibrium does not mean that it is a desirable outcome. There is the problem of Equilibrium Selection. In the driving game, we both very much want to be on the drive-on-the-left or the drive-on-the-right equilibria. We want to avoid the flip-a-coin-and-hope-for-the-best equilibrium. The problem is that the other two equilibria are equally good, so we may not know which one to choose. In this case, the local custom tells us which equilibrium we should coordinate on. When in Rome, we drive on the right, because that it what the Romans do. But when in London, we drive on the left, because that is what the Londoners do.

John Nash’s life is depicted in the movie A Beautiful Mind (2001), directed by Ron Howard. This film has a scene meant to illustrate Nash’s ideas. In this scene, Nash, portrayed by Russell Crowe, imagines a scene in a bar. Several men are waiting in the bar when in walk several attractive brunette women and one very attractive blonde woman. In the first iteration (as the Nash character imagines), the men all pursue the blonde. But she must reject most of them, who then each pursue a brunette. But the brunettes do not wish to be someone’s second choice, so then they also reject each man. So each man (except perhaps one who has won the blonde) finds that his strategy has led him to a bad outcome. So they rethink their strategies, and try again in another iteration. In this case, each man initially approaches a brunette. Each man is happy to have a chance at winning a woman, each brunette is happy to be a man’s first choice, and the only one left out is the blonde, who is shocked at being ignored. While we might wish that the film could have come up with an example that treated the women as players rather than outcomes in a game, let us here ask if it illustrated Nash’s idea. Did the men find a Nash equilibrium?

While the men found an outcome that was better for most of them than when they all approached the blonde, their new strategies do not form a Nash equilibrium. For a set of strategies to form a Nash-equilibrium, no one can do better by changing. But in this scene, any one of the men could do better by approaching the blonde who was then being ignored. As this scenario is described, a chance at winning the blonde is the biggest prize. As long as she is ignored, there is a better outcome available for someone, so any outcome that leaves her ignored cannot be a Nash-equilibrium. Thus this scene failed to illustrate a Nash-equilibrium. (To be fair to the film, it never says “Nash-equilibrium” in this scene, or anywhere else in the movie. Instead, Crowe as Nash says “governing dynamics”.)

The Prisoner’s Dilemma

Suppose you and an acquaintance are arrested by the police. They separate you and offer each of you this deal. “We think that you and your friend are accomplices on a serious crime, but we cannot prove it. So we’ll offer you this deal: If you testify that your friend committed this crime and he stays silent, we will let you go free right now, and we will sentence him to serve ten years in prison. If you both testify against each other, you will each get five years. If you both keep silent, we will hold you as long as we can without charging you, about six months. And we are making this same offer to your friend.” What should you hope for, and what should you do?

You should hope that your acquaintance keeps quiet, and she should hope that you keep quiet. In game theoretic terms, this is known as cooperating, and testifying against your partner is known as defecting. If you both cooperate with each other, you each only get six months in jail. But you can do better. If your partner keeps silent, you could testify against her; that is, you could defect. In this case, you go free, which is a better outcome for you. And what if your partner does not cooperate? What if she defects by testifying against you? In that case, if you keep silent (cooperate), you get ten years in prison. If you testify against her as well (defect), then you only get 5 years. So no matter what she does, you are better off defecting against her. And she can follow the same logic, so no matter what you do, she is better off defecting against you.

So the only Nash-equilibrium in the prisoner’s dilemma is for both of you to defect. This does not mean that this is the best outcome available to you. This equilibrium leads you both to very bad outcome, where each of you spends five years in prison. There is another outcome that is much better for both of you: for you both to cooperate by keeping silent. In that case, you only get six months. If you could somehow agree to this option that gives you your second best result, you can avoid the much worse second worst result. But as long as someone can do better by deviating from this, it is not an equilibrium. And in this case, either of you can do better by defecting. Mutual cooperation is not an equilibrium; the only equilibrium is mutual defection. So following your rational self-interest leads both of you to a very inefficient outcome.

Lest you think that the lesson here is that crime does not pay, I never wrote whether or not you and your acquaintance had actually committed the crime of which you were accused. You are better off testifying against her, and she against you, regardless of whether you actually committed the crime. But the prisoner’s dilemma structure is not limited to accused criminals. Many common human interactions have the structure of the prisoner’s dilemma. Take the example of trade: you have a rare book that you no longer want. I have $100 which I would gladly pay for the book. The book is worth more than $100 to me, and less than $100 to you, so we ought to make a deal. But this simple interaction also has the structure of a prisoner’s dilemma. We both have something that each other wants, and we agree to the exchange. But then you think, maybe I will send you the money and maybe I won’t. If I do not send the money, you would sure hate to have been cheated out of the book. And even if I do send the money, you are still better off with the money and the book. Similarly, I know that maybe you will send the book and maybe you won’t. If you do not send the book, I would sure hate to have been cheated out of the money. And even if you do send the book, I am still better off with the book and the money. Each of us is aware of what the other is thinking, so neither of us honors a deal that would have made us both better off.

Making an agreement works better when we can agree to an equilibrium. A good contract gives no one an incentive to break it. The prisoner’s dilemma, with its single non-cooperative equilibrium, represents a worst-case game. But the scenario models a common trading scenario. Nonetheless, we manage to trade, and find it in our self-interest to do so. Somehow we manage to change the game, and create some cooperative equilibria.

Iterated Prisoner’s Dilemma

Trade flourishes. And we not only trade only out of a moral concern for others; we find it in our best interests to trade. Our way out of the tragic logic of the prisoner’s dilemma lies in the fact that we wish to trade again. One is unlikely to make a living, much less get rich, with a single trade. But we trade now with an eye to the future. I am liable to cooperate with you today in the hopes that we will cooperate today and in the future. “The shadow of the future” changes the game. Let us think about how we actually manage to make that deal about the book. Are you really better off keeping the book, regardless of what I do? If we expect to trade again in the future, you are better off keeping your end of the bargain, and so am I.

Game theoreticians model this by repeating the game. Each repetition can be called an iteration, stage, or trial. So instead of a single-stage prisoner’s dilemma, we now have a repeated or iterated prisoner’s dilemma. Let us suppose that we will keep playing the prisoners dilemma every day. A dollar today is worth more to us than a dollar tomorrow, so we need to discount the value of that dollar tomorrow. The factor by which we discount the future is called, naturally enough, the discount factor. The discount factor can also be thought of as the probability of another iteration. The mathematics is the same for either interpretation. A famous result called the Folk Theorem says that given a sufficient discount factor (i.e. we sufficiently value the future, or think that additional play is sufficiently likely), then any outcome that we could rationally agree to with a binding contract forms a Nash equilibrium without such a binding contract. This is called the Folk Theorem because game theoreticians assumed it was true (and cited it in their papers) before it was actually proven (involving various technical assumptions).

With the infinitely (or indefinitely) repeated game, an infinite number of equilibria now exist. And “always defect” remains an equilibrium: if you and I both plan to defect on every play, either of us will only do worse by cooperating. But there are now cooperative equilibria as well. But one cannot simply cooperate regardless of the play of the other. The strategy to always cooperate is not an equilibrium, for then one of us could do better by defecting. And consistently defecting against a cooperator gives you your best possible outcome. But we can make our play conditional on each other’s past play.

One popular strategy in the literature is called grim-trigger. In grim-trigger, you start out cooperating, and keep cooperating as long as I cooperate. But as soon as I defect, you defect every play after that. In this strategy, you only get burned once. Once you figure that I’m not a consistent cooperator, you never take a chance on me again. A strategy that has done well in tournaments is tit-for-tat. In tit-for-tat, you start off cooperating, and then henceforth copy the previous play of your partner. So as long as we cooperate with each other, we will keep cooperating. But if you are playing tit-for-tat and I defect against you, I will have to cooperate with you, while you defect against me, to get you to cooperate again. The loss that I take from having you defect against me can be regarded as a penalty or punishment.

Many other strategies have been tried in tournaments and simulations. And in the infinitely repeated game, there are an infinite number of equilibria. So equilibrium selection becomes a problem. But cooperative strategies generally form equilibria with each other: if you are playing grim-trigger, and I am playing tit-for-tat, and we both sufficiently value the possibility of future play, then we will keep cooperating and neither of us can do any better with a different strategy. But non-cooperative strategies also form equilibria with each other. If you start off defecting and then repeat my previous play (we could call this suspicious-tit-for-tat), and I always defect, then we will keep defecting against each other, and either of us will do worse if we change to a strategy that has us cooperate.

One of these cooperative strategies may get us to cooperate if you and I expect to repeatedly trade with each other. But what if you and I are unlikely to trade again? One possibility is that rational cooperation may not be possible. Since we do not expect to do business again in the future, we are too tempted to defect against one another today. This is why you are more likely to be cheated by a business that caters to tourists than by a business in your home town. The business in your home town knows that your repeat business is more likely if it treats you well now. The tourist trap knows that your repeat business is unlikely regardless of how it treats you. In a sufficiently small community, traders may know each other, and know who has a reputation for fair dealing, and who has a reputation for defecting. In this case you could defect against the defectors you know, and (assuming that you want to keep trading), cooperate with the cooperators you know.

But human ingenuity has found ways to extend reputation to larger communities. Think again about the case where you want to sell a book, and I want to buy it. If we are trading on a website such as eBay or Amazon, you are liable to report my failure to send the money, and I am liable to report your failure to send the book. Future traders will learn about this, and will hurt our reputations for future trading. If either of us wants to do business in the future, we will find it in our interest to honor our deal today. In this case the website that records our history acts as an institution that expands the possibility of future deals. As long as we expect to play again with someone (not necessarily the same player), we may sufficiently value the future to cooperate today.

Similarly, if a bank loans you money, you might be tempted not to pay. For a large loan, the bank will have some collateral: typically the car or house that you borrowed the money for. If you do not repay, the bank will repossess the collateral. But for a small loan, a bank may not hold any collateral, and if you do not repay, may decide that it is not worth it to try to collect. Nonetheless you will usually still find it in your interest to repay. If you do not, the bank will report this unpaid debt to a credit bureau. And then you will find it difficult to borrow money in the future, as other banks will ask the credit bureau about you before loaning you money.

Institutional solutions such as eBay and credit bureaus have other advantages over popular strategies such as grim-trigger or tit-for-tat. There are an infinite number of equilibria, and we might worry that we are stuck on a non-cooperative equilibrium. But the institution could not exist if traders did not find it in their interest to use it. So they signal that there is a cooperative equilibrium. And such institutions are robust to errors. In tit-for-tat, if you are mistakenly thought to have defected, you will be defected against, in a continuing cycle, until this is corrected by another error. In grim-trigger, if you are mistakenly thought to have defected, you are never cooperated with again. But as long as the error rates of a reputation reporting institution, such as a credit bureau, are sufficiently low, traders can find it in their interest to cooperate, and pay any penalties, even if they are mistaken.

So when the prisoner’s dilemma scenario is repeated, cooperative equilibria are possible. But cooperation is not guaranteed, as non-cooperative equilibria remain. And we may quite rationally fear that we are stuck on such a non-cooperative equilibrium. Non-cooperation is always an equilibrium, and may be the only equilibrium. Having and maintaining a cooperative equilibrium depends on many things, including sufficiently valuing future trading prospects, and some indication that we are on a cooperative equilibrium. Institutions like credit bureaus and eBay can make future prospects more likely and signal a cooperative equilibrium. But cooperation, and reputation that can make it possible, are fragile. Benjamin Franklin recognized this when he wrote “Glass, China and Reputation are easily cracked, and never well mended.”

The Tragedy of the Commons

We need not limit games to two players. A scenario with the structure of the prisoner’s dilemma, extended to many players, is called the tragedy of the commons. Named by Garrett Hardin (1915-2003), it is based on the following scenario: A number of ranchers keep their herds (say of cattle or sheep) in their barns, but there is a common field, the commons, that they graze on. The commons is a renewable resource that can regenerate itself if it is not overgrazed. And at one point, the commons is sufficient to feed all the animals. But then each rancher thinks, “I can be a little richer if I herd a few more animals; after all, if no one else does this, the commons can easily handle my few additional grazers. And if everyone else grazes more, I want to get a bit more before the commons is destroyed.” But then every rancher reasons the same way, and the commons is overgrazed and destroyed. This has the same structure as the prisoner’s dilemma: you cooperate by keeping your herd small, and you defect (against the other herders) by increasing your herd. Everyone is better off if they can all cooperate, but each herder is better off with more animals, no matter what the other herders do.

Many ecological problems, such as overfishing, have this structure. Every fisherman wants an ample stock of fish in the sea, but every fisherman is tempted to catch a few more fish. Each fisherman thinks “If no one else overfishes, the stock of fish can spare a few more for me. And if everyone else overfishes, I want to get a few more fish while there are still fish to be caught.” And as a result, the fish may be wiped out, or severely depleted, when if the fishermen could have limited their catch, they could have left enough fish to reproduce and provide fish forever. In these tragedies of the commons, the individual rancher or fisherman gets the benefit of the extra animals, but the cost is borne by the whole community that uses the commons.

One possibility is that the commons is doomed. Rather than being held in common, such a resource cannot be shared and so must be owned by someone, such as an individual, corporate entity, or government, who is responsible for it. This would be bad news, since some resources (like a field or fishing zone) could be plausibly owned and managed by some authority, but other larger shared resources, such as the oceans or the air, could not. However, Elinor Ostrom (1933-2012) won the 2009 Nobel Prize in Economics for showing how many communities have long solved problems with Tragedy of the Commons structures. Ostrom observed that local communities have come up with local solutions: successful arrangements to share the commons have in common low-cost dispute resolution, with the users of the commons themselves creating and modifying the rules, among other principles. Well-meaning authorities from the outside have sometimes messed up local arrangements that had been working well. Again, human ingenuity has found ways to change the structure of the game to make cooperation possible, saving the commons.

A Cautiously Optimistic Conclusion

Two competing views of human nature come from the French/Swiss philosopher Jean-Jacques Rousseau (1712-1778), and the English philosopher Thomas Hobbes (1588-1679). Both look back to imagine the primitive state of humanity before civilization. Rousseau’s positive image is the “Noble Savage”; he writes “nothing is more gentle than man in his primitive state”. Hobbes’ cynical image is the “State of Nature” which consists of the “war of all against all” where life is “solitary, poor, nasty, brutish and short.” If Rousseau is right, then cooperation should come naturally. Modernity has corrupted us to now care too much about our own interests, so we somehow must restore ourselves to a state closer to the Noble Savage. If Hobbes is right, then defection will come naturally, and we will have to work to make cooperation possible.

Rousseau and Hobbes did not have the archeological evidence to settle their dispute. But we do, and Harvard psychologist Steven Pinker summarizes this evidence in his books The Blank Slate and The Better Angels of our Nature. Alas, he finds that “man in his primitive state” lived with a shockingly high rate of violence. Pinker bluntly concludes “Hobbes was right, Rousseau was wrong.” We should not be surprised. The difficulty comes from the structure of some common interactions, interactions that even a Noble Savage would have faced.

Hobbes’ solution was to institute an absolute sovereign: we form a social contract to make one of us the dictator who will create and enforce the law, taking us out of the violent State of Nature. This absolute dictator, the sovereign, will make us cooperate. He will impose penalties on defectors so that it will no longer be in their interests to cheat. Defecting will no longer be an equilibrium. Looking at the history of dictators since Hobbes’ time, we might be forgiven for our reluctance to institute this option.

Hobbes was an astute analyst of human conflict, able to reason how early humans must have lived. But we need not follow Hobbes all the way to his conclusion. We have seen several cases above where people have worked out cooperative equilibria with much less force, without having to invest so much power in one person. Credit bureaus and eBay, without any police power, create systems where most people find it in their interests to pay their debts and honor their deals. Ostrom documents that many communities, without the power of the states they are part of, made it in their interests to share and maintain the commons.

So to create a better society, we must continue to find cooperative equilibria. Societies that have asked their people to act out of equilibrium, against their own interests, experienced poverty and horrors as their people let them down again and again. Societies that have created cooperative equilibria have prospered, because when people find it in their own interests to benefit others, they naturally do so.

Ironically, Hobbes’ pessimistic view of human nature leaves us with a conditionally hopeful conclusion. Human ingenuity has found ways to change the structures of these games, to create cooperate equilibria, generally without the heavy hand of the state (which has often made things worse). But the non-cooperative solutions in general remain equilibria, so cooperation is fragile and not inevitable. The past was awful, but the world has gotten better as we have solved more problems. And as long as we continue to solve more problems, to create and coordinate on cooperative equilibria, the future looks bright.^[1]

For Review and Discussion:

What is a Nash equilibrium?
What is the prisoner’s dilemma?
How do we avoid being trapped in a prisoner’s dilemma?

Notes and Additional Readings

Axelrod, Robert M. (1984) The Evolution of Cooperation. New York, NY. Basic Books, Inc.

Fudenberg, Drew, and Tirole, Jean. (1991) Game Theory. Cambridge, MA. The MIT Press.

Hill, Douglas E. (2004 May) “Errors of Judgment and Reporting in a Law Merchant System.” Theory and Decision, Vol. 56, Issue 3, pp. 239-267.

Hill, Douglas E. (2004) Reputation in a World of Errors and Corruption. Doctoral Dissertation. University of California, Irvine.

Hobbes, Thomas. (1651) Leviathan: with selected variants from the Latin edition of 1668. Edwin Curley, editor. (1994) Indianapolis, IN. Hackett Publishing Co.

Klein, Daniel B., editor. (1997) Reputation: Studies in the Voluntary Elicitation of Good Behavior. Ann Arbor, MI. University of Michigan Press.

Kosko, Bart. (2002 February 13) “How Many Blonds Mess Up a Nash Equilibrium?” Los Angeles Times.

Myerson, Roger B. (1991) Game Theory: Analysis of Conflict. Cambridge, MA. Harvard University Press.

Ostrom, Elinor. (1990) Governing the Commons: The Evolution of Institutions for Collective Action. New York, NY. Cambridge University Press.

Pinker, Steven. (2002) The Blank Slate: The Modern Denial of Human Nature. New York, NY. Penguin Books.

Pinker, Steven. (2011) The Better Angels of our Nature: Why Violence has Declined. New York, NY. Viking.

Poundstone, William. (1992) Prisoner’s Dilemma. New York, NY. Doubleday.

Rousseau, Jean-Jacques. (1754) Discourse on the Origin and Basis of Inequality Among Men. G.D.H. Cole, translator. (1913). Constitution.org.

Skyrms, Brian. (1996) Evolution of the Social Contract. New York, NY. Cambridge University Press.

Skyrms, Brian. (1998) “The Shadow of the Future.” In Coleman, Jules L. and Morris, Christopher W. Rational Commitment and Social Justice: Essays for Gregory Kavka. Cambridge, UK. Cambridge University Press. Pp. 12-21

Thanks to Kristin Seemuth Whaley and Noah Levin for their helpful comments on this essay. Thanks to my wife for her insight and support. Any errors or omissions belong to the author alone. ↵

License

Icon for the Creative Commons Attribution 4.0 International License

Game Theory, the Nash Equilibrium, and the Prisoner’s Dilemma Copyright © 2019 by Douglas E. Hill is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.