Moral Sentiments and Material Interests - The Moral Economy: Why Good Incentives Are No Substitute for Good Citizens - Samuel Bowles

The Moral Economy: Why Good Incentives Are No Substitute for Good Citizens - Samuel Bowles (2016)

III. Moral Sentiments and Material Interests

An e-mail to me about the experiments that I report in this chapter recalled the “exciting and stimulating times” that my correspondent spent in the early 1950s as a young staffer in the Executive Office of the President. “People worked long hours,” he told me, “and felt compensated by the sense of accomplishment, and … personal importance. Regularly a Friday afternoon meeting would go on until 8 or 9, when the chairman would suggest resuming Saturday morning. Nobody demurred. We all knew it was important, and we were important… . What happened when the President issued an order that anyone who worked on Saturday was to receive overtime pay … ? Saturday meetings virtually disappeared.”

The e-mails were from Thomas Schelling, who, half a century after he left the White House, was awarded the Nobel Prize for convincing economists that their discipline should broaden its focus to include social interactions beyond markets. Was the young Schelling’s experience in the Executive Office of the President atypical?

Incentives work. They often affect behavior almost exactly as conventional economic theory predicts, that is, by assuming that the target of the incentive cares only about his material gain. Textbook examples include the response to incentives by Tunisian sharecroppers and American windshield installers.1 In these cases, the assumption of material self-interest provides a good basis for predicting the effect of varying incentives to increase the payoff of working harder. Their work effort was closely aligned with the extent to which their pay depended on it.

But whiteboard economics sometimes fails. Overtime pay did not induce Schelling and the other White House staffers to happily show up on Saturdays. Substantial rewards for high school matriculation in Israel had no impact on boys and little effect on girls, except among those already quite likely to matriculate.2 Large cash payments in return for tested scholastic achievement in 250 urban schools in the United States were almost entirely ineffective, and incentives for student inputs (reading a book, for example) had modest effects.3 In an unusual natural experiment, the imposition of fines as a way to shorten hospital stays in Norway had the opposite effect.4 In contrast, hospital stays in England were greatly reduced by a policy designed to evoke shame and pride in hospital managers rather than to rely on the calculus of profit and loss.5

Jewish West Bank settlers, Palestinian refugees, and Palestinian students were asked how angry and disgusted they would feel, or how supportive of violence they might be, if their political leaders were to compromise on contested issues between the groups.6 Those who saw their group’s claims (regarding the status of Jerusalem, for example) as reflecting “sacred values” (about half in each of the three groups) expressed far greater anger, disgust, and support for violence if, in exchange for the compromise, their group received monetary compensation.

A similar reaction may explain Swiss citizens’ response to a survey gauging their willingness to accept an environmental hazard: when offered compensation, they became more resistant to the local construction of a nuclear waste facility.7 Many lawyers believe (and experimental evidence suggests) that inserting explicit provisions covering breach of contract increases the likelihood of breach.8

These examples cast doubt on the classical separability assumption that incentives and moral sentiments are simply additive in the implementation of desirable outcomes. I show in this chapter that laboratory experiments, played for significant sums of money, typically among anonymous subjects, suggest that moral and other noneconomic motives are sometimes crowded out by explicit incentives.

At the end of the previous chapter, I charged Aristotle’s Legislator, who is aware of this problem, with the task of designing public policies in light of the crowding-out problem. The fact that appeals to material self-interest sometimes compromise moral sentiments would not worry the Legislator if there were little such sentiment to crowd out. But this is not the case. Natural observation and experimental data indicate that in most populations, few individuals are consistently self-interested, and moral and other-regarding motives are common. Moreover, we will see that these experiments predict what people do outside the lab. Among Brazilian fishermen, for example, those who cooperate in a public-goods experiment onshore adopt more environment-friendly traps and nets when they take to their boats.

Homo socialis

In the Prisoner’s Dilemma game, defecting rather than cooperating with one’s partner maximizes a player’s payoff, irrespective of what the other player does. Defecting in this game is what game theorists call a dominant strategy, and the game is extremely simple; it does not take a game theorist to figure this out. So, assuming that people care only about their own payoffs, we would predict that defection would be universal.

But when the game is played with real people, something like half of players typically cooperate rather than defect.9 Most subjects say that they prefer the mutual cooperation outcome over the higher material payoff they would get by defecting on a cooperator, and they are willing to take a chance that the other player feels the same way (and is willing to take the same chance.)

When players defect, it is often not because they are tempted by the higher payoff that they would get, but because they know that the other player might defect, and they hate the idea that their own cooperation would be exploited by the other. We know this from what happens when the Prisoner’s Dilemma is not played simultaneously, as is standard, meaning that each person decides what to do not knowing what the other will do, but instead is played sequentially (one person chosen randomly moves first). In the sequential game, the second mover usually reciprocates the first player’s move, cooperating if the first has done so, and defecting otherwise. Keep in mind the fact that avoiding being a chump appears to be the motive here, not the prospect of a higher payoff. We will return to it.

The experiments discussed in this and later chapters are listed in table 3.1 (the pages on which they are most fully described are given in the index). A more detailed, technical description of the games is in appendix 2.

The Prisoner’s Dilemma is not the only game in which experimental subjects routinely violate the self-interest assumption.10 Using data from a wide range of experiments, Ernst Fehr and Simon Gaechter estimate that 40 percent to 66 percent of subjects exhibit reciprocal choices, meaning that they returned favors even when not doing so would have given them higher payoffs. The same studies suggest that 20 percent to 30 percent of the subjects exhibit conventional self-regarding preferences.11 In Armin Falk and Michel Kosfeld’s Trust game (described below) fewer than a fifth of experimental subjects made self-interested choices.

Table 3.1. Values indirectly measured in experimental games



Note: The indicated values provide plausible explanations of experimental behavior when this differs from behavior expected of an individual seeking to maximize game payoffs (and believing others to be doing the same). Appendix 2 gives more detail on the structure of these games.

George Loewenstein and his coauthors distinguished three types of players in the experimental games they conducted: “Saints consistently prefer equality, and they do not like to receive higher payoffs than the other party even when they are in a negative relationship with the opponent … Loyalists do not like to receive higher payoffs in neutral or positive relationships, but seek advantageous inequality when … in negative relationships … Ruthless competitors consistently prefer to come out ahead of the other party regardless of the type of relationships” (emphasis in the original).12 Of their subjects, 22 percent were saints, 39 percent were loyalists, and 29 percent were ruthless competitors. The remaining 10 percent did not fit into these categories.

As with Tolstoy’s happy families, in this and other games there seems to be just one way to be self-interested—like Loewenstein’s ruthless competitors—but many ways to depart from the standard economic model. Some are unconditionally altruistic, simply valuing the benefits received by others. Some express a conditional form of altruism: they reciprocate good deeds even when they cannot expect to benefit in any way. Others dislike inequality, apparently out of a commitment to justice. While Homo economicus is among the dramatis personae on the economic stage, experiments show that he is also often seriously outnumbered.

I use the term “social preferences” to refer to motives such as altruism, reciprocity, intrinsic pleasure in helping others, aversion to inequity, ethical commitments, and other motives that induce people to help others more than is consistent with maximizing their own wealth or material payoff. Social preferences are thus not limited to cases in which an actor assigns some value to the payoffs received by another person. I use a broader definition because moral, intrinsic, or other reasons unrelated to a concern for another’s payoffs or well-being often motivate people to help others and adhere to social norms even when it costs them to do so. For example, one may adhere to a social norm not because of the harm that a transgression would do to another, but because of the kind of person one would like to be. Helping the homeless may be motivated by what James Andreoni calls the “warm glow” of giving rather than a concern for the poor.13 Being honest need not be motivated by the harm that the lie would do to others; it may be an end in itself.

Knowing that incentives may undermine one or more of these dimensions of social preferences provides a warning to the Legislator, but not much guidance. How do we design incentives and other policies in the presence of crowding out? To decide whether to use incentives, and if so, what kind, the Legislator has to know more about citizens’ behavior in the absence of incentives and their response to the kinds of incentives that might be put in place. This requires an understanding of how incentives work and why they sometimes fail.

Crowding Out (and In)

To begin, the Legislator considers a paradigmatic problem facing policy makers: how to get citizens to contribute to some public good when it costs them to do so. This can be represented as a Public Goods game. An individual may choose to bear a cost in order to take an action—such as disposing of trash in an environment-friendly manner—that furthers some public good. The individual herself, like all citizens, will benefit from the public good, but let us assume that her cost in contributing is greater than the benefit she personally will receive. Thus, while the best outcome is for everyone to contribute (it maximizes the total payoff for the public), for each citizen, not contributing at all is the individually payoff-maximizing choice, no matter what the other citizens do. Not contributing is the dominant strategy for a payoff maximizer, just as it is in the Prisoner’s Dilemma. Contributing is a form of altruism, that is, helping others at a cost to oneself.

The Public Goods game is thus a version of the Prisoner’s Dilemma with more than two players. Other problems that often take the form of a public-goods problem are the voluntary payment of taxes, limiting one’s carbon footprint, upholding social norms, producing new knowledge in the public domain, maintaining public safety, and acting to maintain the good reputation of one’s group.

The citizen may be encouraged to contribute to the public good by a subsidy or other economic incentive. In what follows, I will use the term “incentive” (without the adjectives explicit, economic, monetary, and so on) to mean an intervention that affects the expected material costs and benefits associated with an action. In the standard economic model, the story ends here: the subsidy reduces the net cost of contributing to the public good, and as a result, more citizens will contribute or they will contribute more.

But some citizens have social preferences too, and these may motivate actions that benefit others even at a cost to oneself. How salient these preferences are relative to the citizen’s self-regarding material motivations will depend on the situation in which the decision to contribute is made. Shopping and voting, for example, are different situations, and for most people the pursuit of self-interest is less likely to be considered an ethical shortcoming when shopping than when voting. In the case of the public good, the motives that are salient will depend on how the contribution is framed, including whether an incentive is provided to those who contribute. The incentive is part of the situation. I term these proximate motives to contribute the citizen’s “experienced values.”

The challenge facing the Legislator is that the framing provided by the incentive may affect the salience of the individual’s social preferences, resulting in a level of experienced values different from would have been the case in the absence of the incentive. When this occurs, social preferences and incentives are not separable, and experienced values may be influenced (positively or negatively) by the use of incentives.

To see how, for any particular individual let the extent of the contribution be represented by a single number, and let the same be true of both explicit incentives and values. Nonseparability occurs when the presence or extent of the incentive affects the individual’s experienced values.

This is illustrated in figure 3.1, panel A of which illustrates separability: the upper route from incentives to the contribution—the pathway via “cost of contribution net of incentive”—is the one stressed by the self-interest paradigm. The costs are a deterrent to contribution (shown by the minus sign on the arrow from cost to contribution). On this causal route, the incentive reduces the net costs of the public-spirited action and thus increases the actor’s motivation to provide the public good.

The lower set of arrows in panel A—passing through “experienced values”—shows the effect of the citizen’s social preferences on experienced values, and the effect of experienced values on the contribution to the public good, which is simply added to the effect of the subsidy. The effect of varying the incentive does not depend on the level of the social preferences, and correspondingly, the effect of varying the social preferences does not depend on the level of the incentive. This is what additivity (or separability) means.

Of course the self-interest paradigm may simply ignore the role of social preferences or even assume them to be absent. But as long as panel A is a good representation of the process of contribution, no harm is done in this, because the effect of the incentive is independent of the level of social preferences. The economists’ policies will work out as expected, even though Homo economicus is a misnomer for citizens who might better be termed Homo socialis.


Figure 3.1. Incentives, experienced values, and contributions to the public good: The problem of nonseparability and crowding out Arrows are positive or negative causal effects. Crowding out occurs in panel B when there is a negative effect (−) from “Incentive” to “Experienced values.”

Panel B illustrates the problem of nonseparability, which arises when this is not the case, because incentives have a negative effect on the individual’s experienced values and hence indirectly have a negative effect on the citizen’s contribution to the public good. Economists following John Stuart Mill in focusing on the citizen “solely as a being who desires to possess wealth” routinely ignore this indirect effect, either because they think it is not there or because it is not a part of economics. But it is definitely there, and because it affects how incentives work, it has to be part of economics.

Because of the effect of incentives on experienced values, the total—direct and indirect—effect of an incentive may fall short of what we would expect if we looked only at its effects on the costs and benefits of the targeted activity. In this case, we say that incentives crowd out social preferences. Then, incentives and social preferences are substitutes: the effect of each on the targeted activity declines as the level of the other increases. Where the effect on social preferences is positive, we have the synergy that the Legislator seeks: crowding in occurs, and social preferences and incentives are complements, each enhancing the effect of other.

The total effect of the introduction of an incentive on the public-goods contribution by an individual is the sum of the direct effect of the subsidy (which must be positive) plus the indirect effect of the subsidy operating via its effect on values (which may be of either sign) and the effect of values on the action (which we assume to be positive). We have separability when there is no indirect effect, either because social preferences are absent or because incentives do not affect their behavioral salience as expressed in “experienced values.” This appears to be true of the effects of incentives on the work activity of American windshield installers, Tunisian farmers, and the other “textbook” cases mentioned at the beginning of the chapter.

Where the indirect effect is negative, meaning that the total effect falls short of the direct effect, then incentives and social preferences are substitutes (or are “sub-additive” or are said to exhibit “negative synergy” or “crowding out”). This may have been the true of the surprisingly modest or even absent effects of financial rewards for schoolwork mentioned at the outset.

Where the indirect effect is negative and large enough to offset the direct effect of the incentive, we have the attention-riveting cases in which incentives backfire, that is, they have the opposite of the intended effect, which I term “strong crowding out.” The Boston firemen’s and Haifa parents’ responses to incentives are examples.

Where the indirect effect is positive, we have crowding in, that is, synergy between the two effects: then incentives and social preferences are complements rather than substitutes, and are sometimes termed “superadditive.” (These four cases—separability, crowding out, crowding in, and strong crowding out—are characterized mathematically in appendix 1.)

How do we detect the crowding phenomenon in experiments? If an incentive were actually to reduce, rather than increase, contributions to the public good, we would surely have evidence of crowding out. But this kind of strong crowding out is just an extreme manifestation of the problem. So simply observing that an incentive has a positive effect is not evidence that crowding out is absent. Where crowding out is present but not “strong,” the effect of the incentive will be in the intended direction, but not as large as it would have been if social preferences and incentives were simply additive. In this hypothetical case of separability, the effect of the incentive would be exactly what an entirely amoral and self-regarding person would do. So to test for the presence, nature, and extent of social preferences and their crowding out (or in), we use the predicted effectiveness of the incentive for such an exemplar of Homo economicus as our benchmark. Behavior deviating from the benchmark is evidence of social preferences and for their nonseparability from material incentives.

Here is an example of a subsidy that “worked” but induced almost entirely selfish behavior in people who, without the incentive, acted quite unselfishly. Juan Camilo Cardenas and his coauthors implemented an experimental “public bads” game called the Common Pool Resource game, which is very similar in structure to the real-world commons problem faced by his subjects—rural Colombian ecosystem users.14

In the experiment, Cardenas let the villagers choose how many “months” they would spend extracting resources from the hypothetical “forest” (the common pool resource). There was a level of exploitation (one month per year) that, if practiced by all, would maximize the total payoffs to the group. But in the experiment that Cardenas implemented, each individual would do better by extracting much more than this social optimum. The villagers immediately recognized the analogy between the experimental game, with its hypothetical forest, and their everyday challenges of making a livelihood from the real forest. There was nothing hypothetical about their payoffs in the experiment; they would earn substantial sums of money if they managed to cooperate.

This setup is similar to the Public Goods game, except that overextracting resources is a “public bad”: each subject in the experiment would earn higher material payoffs by overexploiting the “forest,” irrespective of what the others did. But collectively, they would do best if each limited his or her extraction. The villagers could easily determine the payoffs that they would get for every combination of what they and the others did. Each villager was randomly assigned to one of fourteen groups in which they would play the experiment over a number of periods.

Cardenas and his coauthors followed two conventional practices in behavioral economics. First, the payoffs were real; some subjects went home with substantial sums of money. Second, participants played anonymously; even in treatments allowing for communication among the players, how much each extracted from the forest was known only to the experimenter and to the player herself.

In the first stage of the experiment, lasting eight periods, there were no incentives and no communication among the villagers. The villagers, on average, extracted 44 percent less of the experimental “resource” than the amount that would have maximized their individual payoffs. Cardenas and his coauthors then cleverly used this statistic—the difference between how much a villager extracted from the “forest” and the amount of extraction that would have gained her the greatest material payoff given what everyone else did—to measure each individual’s social preferences. The rationale for their interpreting the statistic in this way was that her social preferences provide a plausible and parsimonious explanation of why she did not maximize her own material gain. The evidence from the first stage of the experiment, then, suggested that social preferences were quite common among the villagers.

But while striking, this was not the answer to the question Cardenas was asking. He wanted to know how either material incentives or communication among the subjects affected their extraction levels and, hence, what he could infer about the conditions affecting the social preferences of the villagers. Here is how he answered the question.

In the second stage of game, with nine periods of play, Cardenas introduced two new treatments. In nine of the groups, the villagers were allowed to communicate with each other briefly before playing anonymously. These groups extracted a bit less under the communication treatment than they had in the no-communication stage, thus deviating even a bit more from what a person who cared only about her own payoffs would do. Apparently, communication among the villagers somewhat enhanced the behavioral salience of their social preferences.

The experimenter explained to the members of the remaining five groups that they would have to pay a small fine (imposed by the experimenter) if it was found that they had extracted more of the resource than the amount that, had all members done the same, would have maximized the payoffs to the group. Call this amount the “social optimum” extraction level. To determine whether members had overextracted in this sense, they would be monitored (which would occur with a probability known to the villagers).

As expected, villagers in these groups initially extracted much less than those without the fine, showing that the penalty had the intended effect. But as the second stage of the experiment progressed, those in the groups subject to the fine raised their extraction levels. The prospect of the fine reduced how much an entirely selfish person would extract; but what Cardenas wanted to know was the effect of the incentive on the social preferences of the villagers, that is, on how much they deviated from what an entirely selfish person would do.

The result was a shocker: by the end of the second stage, their levels of extraction were barely (and not statistically significantly) less than what an entirely self-interested person would do. Remember: these are the very same villagers who in stage one, without incentives, extracted barely more than half of what would have maximized their personal gain.

Figure 3.2 shows the extent to which the villagers extracted less than would have maximized their personal gain for the two stages and two treatments (communication, fine) in the second stage of the experiment. The height of each dot thus indicates the extent of their social preferences. The incentive apparently worked, but it almost entirely sidelined whatever motives had led the villagers, in the absence of the incentive, to forgo substantial individual gain by limiting their extraction levels for the benefit of the group. In other words, the fine worked as a substitute for the villagers’ preexisting social preferences rather than an additional reason to protect the “forest.”


Figure 3.2. The effects of communication and economic incentives on the strength of social preferences In Stage I, both groups experienced the same treatment (no fines, no communication). In Stage II, one group of subjects (“Communication”) was allowed to discuss the game and what they should do (play remained anonymous), while members in the other group (“Fines”) were subjected to monitoring and a fine for overextraction (no communication). (Data from Cardenas, Stranlund, and Willis 2000.)

For the moment, let’s hold in abeyance questions about what would have happened if the fine had been large enough and the monitoring efficient enough to ensure that the villagers would have extracted exactly the social optimum amount from the forest, even though they had become entirely self-interested (having left their social preferences behind). The point here is that incentives work, but possibly with some collateral cultural damage. In chapter VI, I consider why we should worry about such damage.

Crowding Out: A Taxonomy for the Legislator

What was it that eclipsed the villagers’ green-mindedness once the fines were announced? It seems that like the Haifa parents, the villagers took the fine to be the price of transgressing what had previously been a social norm; and they found the gains to be had by overextracting from the “forest” sufficient to justify the risk of being fined. But we do not really know, because the experiment measured what the villagers did, not what they were thinking and feeling about exploiting and maintaining their “forest.”

But without understanding why the introduction of the fine sidelined social preferences among the villagers, it is difficult see how this problem could be avoided. Learning more about the crowding-out process, therefore, is the next challenge for the Legislator. He knows that he will eventually have to understand how the incentive affected what the villagers were thinking and feeling when they made their decisions, but for now he thinks that a taxonomy of crowding effects might allow him to extract a bit more information from experiments like those of Cardenas and his coauthors.

On the basis of how the Colombian villagers reacted, we may suspect that a person who is happy to give to a charity may be less inclined to contribute when a donation reduces her tax bill. But what is it that triggers this change? Is it the mere presence of the tax break (whatever its magnitude) that changes the meaning of the gift? Or is it the magnitude of the subsidy?

When the presence of the incentive (rather than its extent) is what affects the person’s experienced values, we call this “categorical crowding out.” When the extent of the incentive matters, we say that “marginal crowding out” has occurred. We will see that crowding in may also occur—that is, when an incentive enhances the experienced values of the individual—and this too may be either categorical or marginal.

The distinction between marginal and categorical crowding out might have helped the Boston fire commissioner avoid the Christmas call-in debacle. Thinking back, he probably realized later that a large-enough penalty for sick call-ins would have had the effect he wanted, even though his lesser penalties backfired. This would be the case if the crowding-out problem he faced were categorical (the mere presence of the penalties was the problem) rather than marginal.

To clarify these concepts, figure 3.3 shows the possible effects of a subsidy on a person’s contribution to a public good when either categorical or marginal crowding out holds, and when crowding does not occur, that is, under separability. For each level of the subsidy (measured on the horizontal axis), the height of the line gives the level of the contribution—called the individual’s best response to the given subsidy—that will maximize his utility (both from the incentives and from his values). For example, the line labeled “self-regarding contribution” gives the best response of some hypothetical self-regarding individual (with no social preferences to crowd out). He contributes a small amount simply out of self-interest and then contributes more in response to the subsidy. These lines are termed “best response functions”; their slopes are the effect of the subsidy on the level of contributions. (I have drawn these as straight lines, but that is a simplification.)


Figure 3.3. Citizen’s contribution to the public good under the nonseparability of incentives and values Under separability (top line), experienced values and incentives are additive. Categorical crowding out shifts this line downward (s = ε means the subsidy is offered but that it is as small as can be imagined, ε representing an arbitrarily small number). Under strong crowding out, the use of the incentive is counterproductive; this holds for all levels of the subsidy under the strong (marginal) crowding-out downward-sloping line shown. Under categorical crowding out, incentives less than s’ are also counterproductive, in the sense that contributions with the incentive are less than they would have been in the absence of incentives.

Look at the top line (labeled “separability”) as a point of reference. This depicts another hypothetical individual, similar to the one modeled in panel A of figure 3.1. The social preferences and resulting experienced values of the individual induce her to contribute a substantial amount to the public good even when there is no subsidy (the vertical intercept of the top line). The slope of the top line is the effect of the subsidy when marginal crowding out is absent (because separability precludes this). The line labeled “marginal crowding out” is less steep, indicating that marginal crowding out has reduced the effectiveness of variations in the subsidy in altering the amount contributed. When strong marginal crowding out holds (the downward sloping line), the effect is negative (greater subsidy induces a lesser contribution to the public good). Marginal crowding in would be indicated by a line steeper than the separability line (not shown.)

The intercept at the vertical axis of the best-response function gives the citizen’s contribution in the absence of any subsidy (the other-regarding citizen contributing more than the self-regarding when there is no subsidy.) The intercept labeled “other-regarding contribution when s = ε” gives the contribution when a subsidy is offered but it is very small (ε means a number as close to zero as you wish, but not zero). The difference in the vertical intercepts under separability and categorical crowding out shows the extent to which the mere presence of a subsidy per se diminishes social preferences.

Figure 3.3 would provide the Legislator with just the information he needs were he charged with selecting the subsidy. For each subsidy, it shows the contribution to the public good that could be expected depending on the nature and extent of crowding out. If estimates of the best-response function showed that citizens were other-regarding and that subsidies would create strong crowding out, the Legislator would stop using incentives. If the Legislator knew that an incentive would categorically crowd out social preferences, then he would either implement a subsidy larger than s’ in the figure, or no subsidy at all. Any subsidy between 0 and s’, he would see from the figure, would result in lower contributions to the public good.

The Legislator happily adds the best-response functions in figure 3.3 to his tool kit. Looking at the figure, he imagines the plight of the naïve legislator, who is unaware of the crowding-out problem and so believes that the top line (“separability”) gives the relevant policy options. Unless crowding in should occur (not shown in the figure), the naïve Legislator will sometimes be disappointed when the results (the citizens’ contributions) fall short of what he predicted by mistakenly using the best response based on separability.

Measuring Categorical and Marginal Crowding Out

This is not simply a thought experiment. A remarkable study shows that the effects of incentives can be estimated empirically and that both categorical and marginal crowding out do occur. Bernd Irlenbusch and Gabriele Ruchala implemented a public-goods experiment in which 192 German students faced three conditions: no incentives to contribute, and a bonus, given to the highest-contributing individual, that was either high or low.15 Payoffs were such that even with no incentive, individuals would maximize their payoffs by contributing twenty-five units. The units given and received were later converted into equivalents in euros, so as with the experiment among Colombian villagers, the game was played for real money, which the students kept when the game was over.

In figure 3.4 we see that in the no-incentive case, contributions averaged thirty-seven units, or 48 percent above the twenty-five units that participants would have given if they were motivated only by material rewards. As with the Colombian villagers, the German students in the experiment evinced strong social preferences.

Contributions in the low-bonus case were a bit higher than in the absence of the incentive but not significantly different. The high-bonus case saw significantly higher contributions, but the amount contributed (fifty-three units) barely (and insignificantly) exceeded that predicted for self-interested subjects (fifty units). Again, the Ger man students are strongly reminiscent of the Colombian villagers: incentives worked, yet there was collateral cultural damage: the bonus appears to have obliterated pre-existing social preferences.


Figure 3.4. Categorical and marginal crowding out The experimental design is a Public Goods game comparing no incentive with two team-based compensation schemes with a low or high bonus for the highest contributor in the team. The maximum contribution level is 120. (Data from Irlenbusch and Ruchala 2008; also based on calculations described in the text.)

Can we dissect the cultural damage and see why it occurs? Sandra Polanía-Reyes and I devised a way to do this.16 We assumed that marginal crowding out affects the slope of the citizens’ best-response function by a constant given amount (meaning that the function remains linear, like the lines in figure 3.3) We then were able to use the observed behavior in the high- and low-bonus cases to estimate the marginal effect of the bonus, that is, how much the incentive reduced the slope of the line. We found that a unit increase in the bonus was associated with a 0.31 increase in contributions. This contrasts with the marginal effect of 0.42 that would have occurred if subjects without social preferences had simply best responded to the incentive. Crowding out thus reduced the marginal effect of the incentive by 0.11, that is, by 26 percent of what it would have been under separability.

The estimated response to the incentive also gives us the level of categorical crowding out, namely, the difference between the observed contributions (37.04) in the absence of any incentive and the solid dot showing the predicted contributions (34.56) if an arbitrarily small incentive (the “ε incentive”) had been in effect (the vertical intercept of the line in figure 3.4 through the observed points). The incentive thus categorically reduced contributions by 2.48. This categorical reduction in contributions is 21 percent of the extent of the social preferences of the subjects, measured by the excess in observed contributions over what an entirely self-interested person would have done in the absence of an incentive.

The total effect of the subsidy, including its direct and indirect effects, can be accounted for by using the causal logic seen in panel B of figure 3.1. (The details of these calculations and analogous calculations for the small bonus appear in appendix 3.) Here is the accounting for the high bonus compared to no bonus. The direct effect of the high bonus (the top arrows in fig. 3.1) was an increase in predicted contributions of 25 (from the 37.04 that the subjects contributed without the subsidy to the 62.04 that they would have contributed had separability been the case.) There were two indirect effects, one marginal and the other categorical. The categorical effect, as we have seen, was a reduction in contributions by 2.48 units. Marginal crowding out reduced contributions by 6.6 (that is, the reduction in the slope of the best-response function of 0.11 multiplied by the subsidy of 60). The total effect—the direct (25) minus the indirect (9.08) was 15.92. Thus, the marginal crowding-out effect constituted the largest part of the negative indirect effect. With the small bonus, by contrast, categorical crowding out makes up most of the indirect effect and thus is the main source of crowding out.

Looking at figure 3.4, the Legislator can identify the policies and outcomes available to him. Were a naïve legislator to suppose, consistent with Hume’s maxim, that his citizens were knaves, then the policy’s effectiveness would be indicated by the lower line through the open dots. Were a less naïve legislator to recognize that citizens have social preferences (as Hume surely did) but that these and the incentives offered by the subsidy were separable (as Hume apparently also thought), then the policy-effectiveness curve would have been the top line (passing through the solid squares). The Aristotelian Legislator, who would know both that social preferences affect behavior and that the incentive may crowd them out, would know that the middle line represents his true options.

Categorical crowding out can be seen in other experiments. In one, reported willingness to help a stranger load a sofa into a van was much lower under a small money incentive than with no incentive at all; yet a moderate incentive increased the willingness to help.17 This suggests that categorical crowding out was at work. Using these data as Polanía-Reyes and I did in the Irlenbusch and Ruchala study, we estimated that the mere presence of the incentive reduced the willingness to help by 27 percent compared with no incentive.

Another Cardenas experiment allows us to distinguish categorical and marginal crowding, but here we observe categorical crowding in.18 This is our first evidence that incentives and social preferences can sometimes complement, rather than substitute for, each other. Because this is an aim of Aristotle’s Legislator, it is worth going through the result in some detail.

As in his earlier study, Cardenas implemented an experimental Common Pool Resource (public “bad”) game resembling the real-life conservation problem faced by his rural Colombian subjects. As in the other Cardenas experiment, in the absence of any explicit incentives, the villagers on average extracted less of the experimental “resource” than would have maximized their individual payoffs, providing evidence of a significant willingness to sacrifice individual gain in order to protect the resource and raise payoffs for the group. When they were made to pay a small fine if monitoring showed that they had overextracted the resource, they extracted even less than without the fine, showing that the fine had the intended effect.

But that is not the eye-catching result here: the fact that they deviated from what an entirely selfish person would have done by 25 percent more than in the absence of the incentive suggests that the fine increased the salience of the villagers’ social preferences, resulting in their placing a greater experienced value on not overextracting the resource. The small fine crowded in social preferences; the incentive yielded a collateral cultural benefit.

Tellingly, increasing the initially small fine had virtually no effect. It seems that the fine thus did not work as an incentive (if it had, the larger fine should have had a greater effect than the smaller). In Cardenas’s view, the very presence of the fine (whether high or low did not matter) was a signal that alerted subjects to the public nature of the interaction and the importance of conserving the resource. The main effect was due to how the fine framed the situation, not its alteration of the material costs and benefits of extracting from the forest. In Cardenas’s view, the moral message rather than monetary motivation explains the effect.

The Legislator would like to know why, in the second Cardenas experiment, the small fine crowded in social preferences, while the opposite effect had occurred in the first experiment. The villagers were different of course, and the fine may have been framed differently. We will present other examples of fines as messages—some with positive effects, as here, and others with the more common crowding-out effect. These cases hold important lessons for why incentives are sometimes counterproductive and how, under well-designed policies, incentives can crowd in social preferences.

A Surprise for the Legislator

The Legislator knows that when the prices that structure private economic interactions fail to provide incentives for the efficient use of a society’s resources, his task is to design optimal taxes, fines, or subsidies that will correct or attenuate the resulting market failure. Which policies are optimal will of course depend on the preferences of the citizens. But the evidence just presented suggests an added twist: the preferences that determine citizen’s responses to the Legislator’s incentives depend on the incentives themselves. As a result, optimal incentives depend on the nature of the citizens’ preferences that result from this process (of imposing fines or providing subsidies), for these will determine the effects of the incentives.

The fact that preferences may depend on incentives complicates the Legislator’s task, for he cannot simply take the citizens preferences as given, as economists normally do when designing optimal taxes, subsidies, and other incentives. But while this difficulty is a notch up in complexity, it is not some impenetrable chicken-and-egg problem.

For any policy being contemplated, taking account of crowding out just requires the structuring of incentives to take these indirect effects into consideration. The Legislator is thus not simply selecting, say, a tax rate, but rather a tax rate and a possibly altered distribution of preferences in the population that will result from the categorical and marginal effects of this incentive. It is the joint effect of the pair—tax rate, preferences resulting from the tax rate—that the sophisticated Legislator will consider when selecting his policy.

Armed with the idea that incentives and other policies may affect preferences, and using the conceptual apparatus illustrated in figures 3.3 and 3.4, the sophisticated Legislator can rethink the problem of optimal incentives. His intuition is that because crowding out reduces their effectiveness, he should use incentives less than would his naïve counterpart, who is unaware of these adverse incentive effects.

If crowding out is “strong,” meaning that an incentive has an effect the opposite of its intent, the Legislator will of course abandon the use that incentive. So his intuition is correct in this case. But when crowding out blunts but does not reverse the effectiveness of incentives, it may be far from obvious to the Legislator whether the optimal use of incentives is greater or less than that which his naïve counterpart would use. Contrary to his intuition, the Legislator may find that in the presence of crowding out, he will make greater rather than lesser use of incentives.

To see why, consider a case in which the Legislator would like to meet a specific level of contribution or other public-spirited action—for example, that every citizen should take at least four hours of first aid training. The Legislator believes that training beyond four hours offers little additional benefit, and that those with less than four hours of training are not much more able to help others during emergencies than those with no training at all. This is an extreme version of decreasing returns: there is no benefit of additional time in training beyond four hours.


Figure 3.5. Underuse of the incentive by the naïve legislator The sophisticated Legislator, aware of the crowding-out problem, would select the subsidy, s+, greater than that chosen by naïve legislator, who is unaware that the incentive and social preferences are not separable, s.

Thus we have the situation depicted in figure 3.5. The target (four hours in the above example) is the horizontal line, and because the subsidy is costly to implement, the planner would like to find the least subsidy that results in citizens hitting the target. The two upward-sloping lines taken from the previous two figures give, respectively, the true policy options facing the sophisticated Legislator (the lower line) and the options imagined by his naïve counterpart (the upper line). It is clear from the figure that to meet this target, the Legislator will need to adopt the subsidy s+, which is greater than s-, the subsidy adopted by a naïve legislator who is unaware of the crowding problem.

This surprising result may seem to be an artifact of our having chosen a special “hit the target” kind of objective for the Legislator. But it is not. The logic of the target case carries over to cases in which the benefits of the public good are continuously increasing as more of it is provided, but at a diminishing rate. Here is how the sophisticated Legislator will reason in this more general case.19

In the presence of crowding out, the Legislator knows that the true effectiveness of the subsidy is less than what his naïve counterpart believes; a simple comparison of the benefits (effectiveness) and the costs of implementing the policy would seem to recommend lesser use of the subsidy. This is correct; but there is a second, possibly offsetting effect.

Just as in the target case, because the incentive is less effective (either categorically or marginally) than it would be in the absence of crowding out, it follows that for any given level of the subsidy, the extent of underprovision of the public good will be greater. For any subsidy implemented in the presence of crowding out, the true best-response function of the citizen is always below the best-response function imagined by the naïve legislator (except possibly at s = 0, when the two lines coincide, if categorical crowding out is absent). As a result, the degree of public-goods underprovision anticipated by the sophisticated Legislator is greater than that anticipated by his naïve counterpart.

The sophisticated Legislator knows that a consequence of diminishing returns to the level of provision of the public good is that increases in its provision are especially beneficial when it is more underprovided. In this case, the benefit of further increasing the citizens’ contribution is correspondingly greater in the eyes of the sophisticated Legislator than to the naïve legislator, who thinks that for any subsidy level, the provision of the public good will be greater. The Legislator thus has reason to adopt a greater subsidy than would be adopted by the naïve legislator. This is the second effect of taking nonseparability into account, and it may outweigh the first effect, which stems from the reduced effectiveness of the subsidy, and which, in the absence of the second effect, would lead the Legislator to adopt a lesser subsidy. The sophisticated Legislator will choose a larger subsidy if the greater benefit of changing the citizens’ behavior (the second effect) more than offsets the diminished marginal effectiveness of the subsidy (the first effect).20

While it may seem odd that the sophisticated Legislator’s recognition of the crowding-out problem would lead to greater rather than lesser use of the subsidy, it is not. Think about the doctor who discovers that a treatment is less effective than he thought. Will he prescribe a smaller dose? Not necessarily; even if he is attentive to the cost of the treatment for the patient, he may opt for a stronger dose, or else abandon the treatment in favor of an alternative. Like the doctor, the Legislator may use a greater level of subsidy precisely because it is less effective.

But if the treatment is less effective, the doctor or the Legislator might also seek other ways to accomplish the same end. Attendance at first aid courses might be promoted by direct appeals to people’s social preferences, for example, by clarifying to citizens how important it is during natural disasters that most people know the elements of first aid. Where the Legislator has other options, knowledge of crowding out may lead him either to abandon the subsidy entirely or to combine it with direct appeals to citizen’ social preferences.

The Lab and the Street

The experimental evidence for crowding out and the guidance it might give the Legislator would be of little interest if lab results did not predict behaviors outside the lab. Generalizing directly from experiments, even for phenomena much simpler than separability, is a concern in any empirical study, and it is often unwarranted.21

Consider, for example, the Dictator game, in which one experimental subject is provisionally given a sum of money and is asked to allocate any amount (including all, none, or some fraction of it) to the second player, whose only role is to be a passive recipient. The personal identities of the dictator and the recipient are not known to each other. Typically, more than 60 percent of dictators allocate a positive sum to the recipient, and the average given is about a fifth of the sum initially granted by the experimenter to the dictator.

But we would be sadly mistaken if we inferred from this that 60 percent of people would spontaneously transfer funds to an anonymous passerby, or even that the same subjects would offer a fifth of the money in their wallet to a homeless person asking for help. Another example: experimental subjects who reported that they had never given to a charity before allocated 65 percent of their endowment to a named charity in a lab experiment.22 And one can bet that they did not empty their pockets for the next homeless person they encountered.

A possible explanation for the discrepancies between experimental and real-world behavior is that most people are strongly influenced by cues present in the situation in which they are acting, and there is no reason to think they respond any differently during experiments. An experiment about giving may prompt giving.

Human behavioral experiments raise four concerns about external validity that do not arise in most well-designed natural science experiments. First, experimental subjects typically know they are under a researcher’s microscope, and they may behave differently from how they would under total anonymity or, perhaps more relevant for the study of social behavior, under the scrutiny of neighbors, family, or workmates. Second, experimental interactions with other subjects are typically anonymous and lack opportunities for ongoing face-to-face communication, unlike many social interactions of interest to economists and policy makers. Third, subject pools—to date, overwhelmingly students—may be quite different from other populations, due age effects and the processes of recruitment and self-selection.

Finally, the social interactions studied in most experiments are social dilemmas—variations on the Prisoner’s Dilemma or Public Goods games—or tasks involving sharing with others, like the Ultimatum game and the Dictator game. In these settings, where social preferences are likely to be important, there is something to be crowded out. But while we would be right in concluding from experimental evidence that incentives may crowd out blood donations or participation in community-service projects, we might wonder whether this evidence has as much to say about the effect of incentives on our behavior when it comes to shopping or cleaning hotel rooms. We already know that it would be a mistake to think that crowding out would diminish the effect of incentives for hard work among Tunisian sharecroppers and American workers installing windshields.

It is impossible to know whether these four aspects of behavioral experiments bias the results in ways relevant to the question of separability. For example, in most cases subjects are paid a “show up” fee to participate in an experiment. Does this practice attract the more materially oriented, who may be less motivated by social preferences subject to crowding out? Conversely, experimenters do not generally communicate the subject of their research, but if potential subjects knew that an experiment was about cooperation, those who signed up might be atypically civic minded.

We can do more than speculate about these problems. Nicole Baran and her coauthors wanted to find out whether University of Chicago business students who had acted with greater reciprocity in an experimental game also reciprocated the great education the university had provided them, by contributing more to the university following graduation.

In the Trust game that Baran implemented, subjects in the role of “investor” were provisionally given a sum from which they were to transfer some amount to another subject, called the “trustee.” This amount was then tripled by the experimenter. The trustee, knowing the investor’s choice, could in turn “back-transfer” some (or all, or none) of this tripled amount, returning a benefit to the investor. Baran asked whether those who as trustees most generously reciprocated large transfers by the investor were also more likely to donate to a University of Chicago alumni fund. They were.23

Similarly, among the Japanese shrimp fishermen whom Jeffrey Carpenter and Erika Seki studied, those who contributed more in a public-goods experiment were more likely to be members of fishing cooperatives, which shared costs and catches among many boats, than to fish under the usual private boat arrangements.24 A similar pattern was found among fishermen in northeastern Brazil, where some fish offshore in large crews, whose success depends on cooperation and coordination, but those exploiting inland waters fish alone. The ocean fishers were significantly more generous in Public Goods, Ultimatum, and Dictator games than the inland fishers.25

A better test of the external validity of experiments would go beyond simply noting whether subjects took part in a cooperation-sensitive production process like offshore or cooperative fishing, and would include a behavior-based measure of individuals’ cooperativeness. The Brazilian fishers provide just such a test. Shrimp are caught in large plastic bucket-like contraptions; the fishermen cut holes in the bottoms of the traps to allow the immature shrimp to escape, thereby preserving the stock for future catches.

The fishermen thus face a real-world social dilemma: the expected income of each would be greatest if he were to cut smaller holes in his traps (increasing his own catch) while others cut larger holes in theirs (preserving future stocks). In Prisoner’s Dilemma terms, small trap holes are a form of defection that maximizes the individual’s material payoff irrespective of what others do (it is the dominant strategy). But a shrimper might resist the temptation to defect if he were both public spirited toward the other fishers and sufficiently patient to value the future opportunities that they all would lose were he to use traps with smaller holes.

Ernst Fehr and Andreas Leibbrandt implemented both a Public Goods game and an experimental measure of impatience with the shrimpers. They found that the shrimpers with both greater patience and greater cooperativeness in the experimental game punched significantly larger holes in their traps, thereby protecting future stocks for the entire community.26 The effects, controlling for a large number of other possible influences on hole size, were substantial. A shrimper whose experimentally measured patience and cooperativeness were a standard deviation greater than the mean was predicted to cut holes in his traps that were half a standard deviation larger than the mean.

Additional evidence of external validity comes from a set of experiments and field studies with forty-nine groups of herders of the Bale Oromo people in Ethiopia, who were engaged in forestcommons management. Devesh Rustagi and his coauthors implemented public-goods experiments with a total of 679 herders, and also studied the success of the herders’ cooperative forest projects.

The most common behavioral type in their experiments, constituting just over a third of the subjects, were “conditional cooperators,” who responded to higher contributions by others by contributing more to the public good themselves. Controlling for a large number of other influences on the success of the forest projects, the authors found that groups with a larger number of conditional cooperators were more successful—they planted more new trees—than those with fewer conditional cooperators. This was in part because members of groups with more conditional cooperators spent significantly more time monitoring others’ use of the forest. As with the Brazilian shrimpers, differences in the fraction of conditional cooperators in a group were associated with substantial increases in trees planted or time spent monitoring others.27

The evidence from a large number of experiments suggests that students volunteering for experiments are not more prosocial than other students; nor are they more prosocial than nonstudents. Indeed they seem to be less so. Students at Cardenas’s university in Bogota were more self-interested, according to the results of his Common Pool Resource game, than were the villagers in the experiments just described. Kansas City warehouse workers were more generous in a giving experiment (the Dictator game) than were students at Kansas City Community College. Dutch students showed less aversion to inequality in their experimental behavior than did Dutch citizens who were not students.28

When Ernst Fehr and John List played a Trust game with students and with the chief executive officers of Costa Rican businesses, they found that the businessmen in the role of investor trusted more (transferred more to the trustee) and also reciprocated the investor’s trust to a far greater degree than did the students, as can be seen in figure 3.6.29


Figure 3.6. Reciprocation of trusting offers in the Trust game among Costa Rican students and CEOs For a given level of transfer by the investor, the CEO trustees back-transferred more than students. (Data from Fehr and List 2004.)

While these tests of the experimental validity of social preference experiments are encouraging, none directly test whether those who act as if they have separable preferences in the experiments do the same in outside the lab. Because testing for separability in natural settings is difficult, it is not clear how such a test would be conducted as a practical matter.

Moral and Material Synergies

It appears, then, that J. S. Mill took the field of political economy in the wrong direction when he narrowed its subject to the study of the individual “solely as a being who desires to possess wealth.” Mill’s surprising exclusion of ethical and other-regarding motives would have been a harmless simplification if these motives were really absent (not something that Mill would have ever supposed) or if the effects of incentives could simply be added to the effects of the excluded motives (which is what Mill must have thought). But as we have seen, neither of these justifications can be sustained.

Motives such as reciprocity, generosity, and trust are common, and these preferences may be crowded out by the use of explicit incentives. We have seen how information about the nature (categorical, marginal, strong) and extent of crowding out can guide the sophisticated Legislator in his choice of the level of incentives.

Happy though he is with these new additions to his toolbox—the citizens’ best-response functions and what they say about the direct and indirect effects of incentives—the Legislator would surely want to go beyond simply designing appropriate policies and taking the crowding-out problem as a given. The Legislator could seek to frame incentives and other policies so that they crowd in, rather than crowd out, ethical and other-regarding motivations, as in Cardenas’s second experiment in rural Colombia.

This thought gives him an addition to his toolbox: a crowding-in best-response function. Notice from figure 3.5 that were crowding in to obtain, the true best-response function of the citizens (not shown) would lie above the separability line imagined by the naïve legislator. It would have either a vertical-axis intercept above that of the separability line (categorical crowding in, as in the Cardenas experiment) or a steeper slope (marginal crowding in, indicating greater effectiveness of the subsidy) or both. But using this new tool requires turning the crowding problem on its head and creating a synergy between social preferences and incentives so as to make the subsidy more effective than the naïve legislator expected.

Is there some way that he could transform the nonseparability of incentives and social preferences from a curse to a blessing? To do this, the sophisticated Legislator will need more than the simple taxonomy (marginal versus categorical crowding out and crowding in) introduced here. He will have to penetrate the black box that has so far obscured the causal origins of crowding out. He must discover the cognitive processes that account for the nonseparability of material interests and moral sentiments.