Previous posts had examined the replicator equation as the basis of agent behavior in an arms race defined by a game of rock-paper-scissors (RPS). This post begins a follow-on examination regarding the use of best-response as an alternative behavior or strategy on the part of competing agents. In some ways, best-response may be unrealistic with respect to how agents can adapt, especially if they are capable of making large jumps or changes to strategies in short-periods of time that are not reflective of real-world organizations. However, in other ways, the strategy is quite realistic for social actors because it affords them opportunities to revive dead or eliminated strategies when suitable while the biologically based replicator loses them forever once extinct.
In international relations, the differences between the replicator equation and best-response were alluded to in neorealist discussions of arms races. Ken Waltz argued that states sought to imitate the capabilities of leading great powers, thus pushing the system towards homogeneity.
The fate of each state depends on its responses to what other states do. The possibility that conflict will be conducted by force leads to competition in the arts and the instruments of force. Competition produces a tendency toward the sameness of competitors… Contending states imitate the military innovations contrived by the country of greatest capability and ingenuity. And so the weapons of major contenders, and even their strategies, begin to look much the same all over the world.
… The expectation is not that a balance once achieved, will be maintained, but that a balance, once disrupted, will be restored in one way or another. Balances of power recurrently form. Since the theory depicts international politics as a competitive system, one predicts more specifically that states will display characteristics common to competitors: namely, that they will imitate each other and become socialized to their system.
By contrast, John Mearsheimer argued that states will seek to counter the capabilities and strategies of the great powers (or that great powers will seek to counter one another). Whereas Waltz saw the international system moving towards homogeneity, Mearsheimer argued that it would move towards heterogeneity as competing states sought to develop military capabilities that would defeat their rivals.
Kenneth Waltz has made famous the argument that security competition drives great power to imitate the successful practices of their opponents. States are socialized, he argues, to “conform to the common international practices.” Indeed, they have no choice but to do so if they hope to survive in the rough-and-tumble of world politics… The result of this tendency toward sameness is clearly maintenance of the status quo. After all, balancing is the critical conforming behavior, and it works to preserve, not upset, the balance of power. This is straightforward defensive realism.
…But Waltz overlooks two closely related aspects of state behavior that make international politics more offensive-oriented and more dangerous than he allows. States not only emulate successful balancing behavior, they also imitate successful aggression. For example, one reason that the United States sought to reverse Saddam Hussein’s conquest of Kuwait in 1990-91 was fear that other states might conclude that aggression pays and thus initiate more wars of conquest.
Furthermore, great powers not only initiate each other’s successful practices, they also prize innovation. States look for new ways to gain advantage over opponents, by developing new weapons, innovative military doctrines, or clever strategies. Important benefits often accrue to states that behave in an unexpected way, which is why states worry so much about strategic surprise.
To understand these differences, it is useful to consider a simple example of RPS again. A player that tends to play rock in a room full competitors predominantly playing paper will imitate the crowd and start to play paper with higher frequency according to the replicator dynamic, and may never discover the option of scissors if it has been eliminated from the population previously. Indeed, this is why the model examined in previous postings was capable of getting stuck in corners defined by pure strategies. By comparison, a player using best-response in the same scenario will shift to a strategy of playing scissors, knowing that scissors are the best possible alternative to rivals that play paper in the game of RPS. Moreover, the assumption of best-response is that the option of scissors is not historically contingent, and that it can be chosen by agents even if it was eliminated from the population beforehand. Thus, while the replicator dynamic and best-response are both adaptive, they constitute different logics with respect to how agents learn and change as a result of their experiences.
From a computational perspective, best-response can be implemented as simple linear program that searches across all possible combinations of rocks, papers, and scissors and identifies the mix that produces the highest payoff given an agents particular circumstances. While more sophisticated ways of searching for across alternatives may exist, the fact is that a brute force search is relatively simple to code, and doesn’t cost much computationally for a three dimensional space defined by the settings for rocks, papers, and scissors.
Future postings will examine the implementation of this strategy in detail, and then compare the performance of the model when agents employ best-response vs. replicator and discuss its implications for arms races given that the best-response strategy addresses many of the limitations of the biological foundation of the previous model versions of the model.