Flirting With Disaster
By Marc Gerstein    

A Beautiful Mind

Nash's Equilibrium Meets Schelling's Curves


The [Nash] equilibrium is without doubt the single game theoretic solution concept that is most frequently applied in economics. Economic applications include oligopoly, entry and exit, market equilibrium, search, location, bargaining, product quality, auctions, insurance, principal-agent [problems], higher education, discrimination, public goods, what have you. On the political front, applications include voting, arms control and inspection, as well as most international political models (deterrence, etc.). Biological applications all deal with forms of strategic equilibrium; they suggest an interpretation of equilibrium quite different from the usual overt rationalism.


Aumann, R. J., "Game theory," in The New Palgrave Dictionary of Economics, edited by M. Milgate and P. Newman, 1987, pp. 460-482.

Nobel Laureate John Nash


Mathematician John Nash received a Nobel prize for his 1949 formulation as a graduate student of a particular type of equilibrium in non-cooperative n-player games. His equilibria---which are now universally called Nash equilibria---are those which each player chooses his best strategy based upon what others are doing and in which no player can improve his payoff by unilaterally changing his strategy. Note that this is not necessarily the approach that maximizes the payoffs. In fact, Nash equilibria are often grossly sub-optimal.

In chapter 10, Flirting With Disaster describes a problem known as the Prisoner's Dilemma by way of its traditional representation of two criminals who are enticed to inform on each other with a set of carefully structured rewards (see below). In the additional example explored in this page, a real world commercial illustration is used to introduce a whole family of related problems, as well as demonstrate a clever graphical technique developed by Thomas Schelling to analyze them.

The Classical Prisoner's Dilemma

Imagine that two criminals have been apprehended shortly after a crime they are known to have committed, but without sufficient evidence to convict them. The District Attorney would rather have a strong conviction of one perpetrator than take her chances with circumstantial evidence in a trial of both, so she separates the suspects and offers them the chance to inform on each other.

The informer will get a very light sentence while the other criminal will get a very harsh one. If both stay silent, the D.A. will charge them both with weapons possession, a lessor crime but one she knows she can win at trial. If they both inform on each other, the D.A. will count herself lucky and seek a sentence somewhat less than the harsh extreme but not as light as the weapons charge.


A Two-company Competition


If a company considers lowering its prices to grab market share in a two-company market, it is often clear that both companies might well be better off at higher price levels rather than touching off a mutually-destructive series of price cuts that might well leave market shares unchanged, but profits lower. (This assumes that overall demand due to price elasticity will not rise sufficiently to offset the loss of profits arising from lowered profit margins.) The critical question is the balance of forces that both create and inhibit such self-destructive actions. To understand this behavior, it is essential to examine the payoffs carefully. To do so, let us consider a "pure case" in which prices and cost structures of the parties are at parity.

From their separate perspectives, each competitive firm in such a situation achieves an apparent benefit from lowering its prices. If they do so, and their competitor does not respond, they increase their market share. If the other party does respond, the initiator is better off at the new price parity than at the disadvantage that might arise if the other party had dropped their prices first. Thus, from each party's perspective, lowering prices is unambiguously preferred because lower prices are better than higher ones when one cannot truly know what the other party will do. At the same time, for obvious reasons, each party prefers the other not to lower prices at all, because that action is universally better no matter which action one chooses for oneself. (Photo copyright Tor-Erik Bakke.)

Each player therefore has a clear preference for himself, as well as a preference for the other party. Significantly, these preferences go in opposite directions. As a result of these payoffs, and acting from a self-interested perspective in the face of uncertainty, both companies will lower prices.

You will also recognize this logic as consistent with basic microeconomic theory, although the source of the dynamics is different.

Since it is actually in the interests of both parties to make their unpreferred choice, i.e., not to lower their prices at all, the rewards are such that both parties are better off doing what they do not prefer, rather than for either or both to act as they do prefer. However, the less preferred option only pays off if the decision is mutual, whereas the preferred choice is optimum if you do not know for certain what the other party will do. Equally important, it is also optimum if the other party makes their preferred choice.

With these rewards, either party has an incentive to try to take advantage of the other by acting alone, whereas both are required to cooperate. It is vital to understand that if both parties try to take advantage of the other, they both lose---but not as much---in comparison to attempting to cooperate and "getting suckered." This structuring of the rewards is critical to the outcome.

Since we are viewing the prisoner's dilemma as a non-cooperative game, in serial form equilibrium requires each player to have some view of the other's strategy. Consistency of decision rules are the key. For instance, let us say that each player does not "defect" (the technical term used to refer to unilateral action) as long as the other does the same. Once having been betrayed, the player then defects on every subsequent round. One can see that this is a harsh version of "do unto others" in that there is no forgiveness. Assuming that the players can discern the pattern over multiple sessions of play, this strategy will lead to very high payoffs for both, although not quite as high as consistent cooperation.

Another strategy---perhaps counter-intuitive---is for the players to alternate cooperation and defection. Of course, this is more complex, but there is no reason why it would not work as a stable equilibrium, assuming that the players could figure it out. A final strategy---even less obvious---is for one player to alternate cooperation and defection while the other is always cooperates. If either varies, then the other will play non-cooperatively from then onward. (These ideas from David M. Kreps, Game Theory and Economic Modeling, Oxford, 1995.)

This last pattern creates a disproportionate payoff for the first player, but it is in the rational interests of the second player to put up with this because playing cooperatively is the best they can do because of the penalties of deviating from the pattern. This notion of "the best you can do under the circumstances" is vital to the notion of the Nash equilibrium, although it is clear that this has little to do with fairness. In fact, this is one of game theory's great strengths. By focusing on rewards and strategies rather than other factors, one can often gain great insight into the basic structure of the interaction and the manner in which certain behaviors are rewarded and others punished.


Multi-party Games

Thomas Schelling has extended the prisoner's dilemma into a multi-party game. It is useful to discuss them as another aspect of modeling the behavior of complex systems.

The conversion from the deceptively simple two-party prisoner's dilemma problem to its multi-party analog is aided by Schelling's graphical display of the payoff structures. I call them "Schelling Curves." First, I will describe the overall logic and then use the curves to exemplify both the two-party and multi-party versions of the prisoner's dilemma. From there, expansion to other interesting cases is straightforward and informative.

Schelling expresses the payoffs associated with both the "preferred" and "unpreferred" choices as functions of the population selecting the unpreferred alternative. The preferred choice is called "Left" and the unpreferred "Right" to be consistent with the initial diagrammatic layout, although this convention is arbitrary.

Consistent with the general structure of the prisoner's dilemma, but involving an entire population rather than just two parties, the diagram shows that L, the preferred choice, always has a higher payoff than R, the unpreferred choice.

As you can see, however, there is some coalition size (where the R-curve crosses the x-axis, at the 50% point in this case) at which all of those choosing Right will be better off than they would be if all had chosen Left (where the payoff for everyone is nominally zero), even if the remainder continue to choose Left (and obtain even higher payoffs by doing so).

One can immediately see both the logic and the dysfunctional character of this situation. If all start choosing Left, there is no incentive to switch to Right unless the creation of a coalition to "get over the hump" can be assured. In fact, even if we assume that a disciplined population can create a coalition to act in collective self interest, we are likely to have problems. Examples include self-imposed fishing restrictions, or restraint in watering one's lawn during a drought. One sees from the figure that since Left is always preferred, it always pays to cheat on one's coalition partners. This is known as free-riding, and is a very important temptation to avoid in any public policy endeavor, as we saw in the economic problems explored in chapter 12. For this reason, we see that any coalition is unstable. In fact, the only stable equilibrium is a suboptimal Nash equilibrium in which all choose Left, the quintessential prisoner's dilemma outcome.

In the case shown in the graph, the incentive to switch from Right to Left is constant, but it should be clear that it does not have to be. A variety of other curve shapes are explored below, and you can easily make up your own, although fitting such technically feasible curves to real-life situations can sometimes be a challenge, although it can be fun to try.

If we can meaningfully average the population's payoffs to develop some measure of "community welfare" (the dotted line in the figure is a simple weighted average), you will note that the population's average welfare increases as more choose Right. To maximize the collective good, all choosing Right is preferred---although never from the perspective of individuals. This has obvious social policy implications, since there is a conflict between those decisions that would be made by individuals and those that would be made on behalf of society.

It doesn't take too much imagination to see parallels between this and similar reward structures and a great many of the economic and public policy crises that seem to increasingly plague our world. It should be clear that these problems are built into the basic structure of the situation, and are not a product of individual selfishness---except that of the policy-makers and influencers that encourage the creation of rewards that lead to socially undesirable outcomes. The subprime debacle currently unfolding is but one dramatic illustration.

We now briefly return to the two-party prisoner's dilemma. In the figure below, the payoff curves have been lowered from those above to correspond to the punishments in the classic version of the story that involves jail sentences of various lengths.

Rather than a continuum, there are only three viable positions on the diagram: all left, the mid-point, and the far right. Point A is the situation when both parties (noted by blue and green on the diagram) have chosen Left, the preferred choice of informing (often called defection in the game theory parlance). At the mid-point, one criminal (blue) has chosen silence and the other (green) the role of informer, while at the far right both parties have chosen to stay silent, the cooperative choice that maximizes the collective good, as you can see from the dotted line.

As you can see, the individual payoff at C (representing omerta among criminals, and competitive restraint in the commercial case described above) is higher than that at A, but not as high as B1, the "light sentence" reward to a single informer. B2 is the raw deal of a "betrayal."

Starting from C, one sees that B1 is clearly better, and since we might assume that any savvy criminal will figure this out, and no one would want to get trapped in position B2 ("getting suckered"), defecting is clearly the right choice---but only for one individual.

Despite having started by recognizing the merits of C, one informs to avoid B2, hopes for B1, but settles for A, the Nash equilibrium.

It is truly beautiful!


Curves

It may have occurred to you that there is no particular reason why we must restrict ourselves to straight lines for the payoff functions, or to positive and equal slopes as shown in the previous two figures. The curves can be any shape we like, and they can cross---which would mean that the Left choice would not be universally preferred as it is in the prisoner's dilemma. Although it is clearly impractical to discuss all the myriad possibilities here, a few illustrations are useful to close out this section and stimulate your imagination.

The figure at right shows an instance in which the payoff functions cross, an important configuration. Right is the preferred choice when the number choosing Left is high, while Left is preferred when the Right choice is dominant. This is made explicit in the bottom part of the figure. Letter codes indicate which choice will be preferred, by how much (its y-axis value), and the direction the payoff differences will drive the equilibrium (arrows).

This pattern of payoffs exists with congestion problems. Some people prefer the choice that most other people are not making. They avoid busy highways, and will try to shop when crowds are light. Short-cuts work as long as they are sparsely traveled. When heavily used, you are often better off on the main roads.

Unlike the multi-party prisoner's dilemma, this new problem structure leads to an equilibrium at an intermediate point, not at an extreme. The precise equilibrium is approximately at the intersection of the two payoff curves. It corresponds to that point at which a single further switch from one curve to the other will no longer improve the payoff to the switcher.

Note that the decision-making process is discrete, not continuous. Every time a person switches, they move diagonally, since their behavior changes the L/R split as well as the curve they are on. The bottom part of the figure uses this type of calculation, although the resolution of the diagram is low, so the added precision is not obvious. If we assume straight line payoff curves in which L=ax+b and R=cx+d, then the equilibrium occurs at the integer value between a/(a-c) and c/(a-c) to the right of the intersection. In some cases, there may be two adjacent left/right proportions that allow back-and-forth switching without cost. As the population becomes large, of course, the equilibrium and the intersection become the same for all intents and purposes. (See Schelling (1978) page 226, note 15.)

While the payoffs in this problem structure create an equilibrium away from an extreme, it is still inefficient. The dotted curve shows that the collective maximum is considerably to the right of the intersection, although short of the Right boundary. The exact value will vary with the slopes, and in some cases the collective maximum will be at the right extreme. This also has social policy implications because the natural equilibrium is not necessarily at the point that produces the overall best outcome. Without government intervention, such as regulation, only a radical restructuring of the rewards will achieve optimum welfare. Often, such changes in structure are impossible.


To illustrate another common real-world situation, we can go a step further changing the curves. If we retain the general configuration of this problem but interchange the position the Left and Right curves so that Right is above Left on the right side of the diagram, we will find that two equilibria emerge, one at each extreme.

With this change, while all choosing Right is clearly superior to all choosing Left, both equilibria are stable. When all choose Left, there is no incentive for a few to change to Right unless they can be assured that many others will also do so.

As a practical example, consider one's being stuck with inefficient technical standards, an unfortunately common occurrence. Being out of step with others is generally less attractive than living with the inefficiency, so there is little incentive to seek a change despite the recognition that all would be better off if everybody did so. When one combines these ideas with the notion of path dependency, where outcomes depend of the particular early steps in evolution, one can see both how such undesirable situations arise, and why they are so difficult to change.

The careful reader may have noticed that the portion of the curves to the left of the intersection creates incentives similar to those in the multi-party prisoner's dilemma, even though the curves are not parallel. As such, there is some size coalition that will benefit from choosing Right, even if their number is insufficient to get everyone to switch. Of course, coalition members will always be tempted to defect, since Left has higher payoffs when the coalition is below the critical mass. Unlike the earlier case, however, there exists some coalition size that can change the incentives for all, thus creating a global switch and a general increase in community welfare at the Right extremity. A good example is the switch from conventional to digital photography. For many years, there existed some community that could sensibly utilize the new technology, even if many others did not. At a certain point, however, it becomes best for everyone to make the switch. Now, after more than thirty-five years since the first digital camera was produced at Eastman Kodak, most of us have.

The final example displays a somewhat more complex payoff structure, although it is one we might well discover in everyday life. Similar to other problems we have discussed, the Left solution is preferred to the Right when few have chosen Right. However, Right is preferred in the middle of the range, although it loses its allure when many have chosen it.

You can see from the lower part of the diagram that this problem structure creates two equilibria. At proportions of Right choice to the left of the left-most intersection (about 24%), Left is preferred. To the right of this intersection---similar to the previous example---the Right-choosing coalition will dominate until its membership grows to the second intersection. This represents the second equilibrium. From this point, one is better off choosing Left and being in the minority.

Schelling cites committee membership as an example, doubtless from his extensive experience in the university. At low levels of participation (Right means coming to the meetings, Left means not coming), there is little benefit in showing up, because few do.

In the middle range, participation has meaning and one's vote makes a difference, so people increasingly come, thus encouraging others to do so. When almost everyone comes, attendance creates marginal returns, and one is better off letting the work be done by others. There are likely to be many colleagues to represent your views and give you a summary of what was said.

A similar situation might well arise in U.S. Presidential elections, since the Electoral College system creates unusual incentives for individual voters. When one's candidate commands a tiny minority in one's state, one's marginal vote makes little difference to the ultimate outcome. As more people vote for your candidate, however, there is increasing value to voting because the chance of shifting the state's electoral votes increases. These incentives hold until the win is assured, at which point there is little benefit in voting because one's vote cannot contribute to the outcome once the commitment of one's state's electoral votes is assured.


Conclusion

While it is clear that game theory and Schelling curves are not the only ways to look at distributed decision-making, they are useful to look at many classes of problems, and understanding the shape of payoff functions is a remarkably powerful and useful discipline no matter what the problem.