The Dueling Laws of Large and Small Numbers - The Drunkard's Walk: How Randomness Rules Our Lives - Leonard Mlodinow

The Drunkard's Walk: How Randomness Rules Our Lives - Leonard Mlodinow (2008)

Chapter 5. The Dueling Laws of Large and Small Numbers

IN THEIR WORK, Cardano, Galileo, and Pascal assumed that the probabilities relevant to the problems they tackled were known. Galileo, for example, assumed that a die has an equal chance of landing on any of its six faces. But how solid is such “knowledge”? The grand duke’s dice were probably designed not to favor any face, but that doesn’t mean fairness was actually achieved. Galileo could have tested his assumption by observing a number of tosses and recording how often each face came up. If he had repeated the test several times, however, he would probably have found a slightly different distribution each time, and even small deviations might have mattered, given the tiny differential he was asked to explain. In order to make the early work on randomness applicable to the real world, that issue had to be addressed: What is the connection between underlying probabilities and observed results? What does it mean, from a practical point of view, when we say the chances are 1 in 6 a die will land on 2? If it doesn’t mean that in any series of tosses the die will land on the 2 exactly 1 time in 6, then on what do we base our belief that the chances of throwing a 2 really are 1 in 6? And what does it mean when a doctor says that a drug is 70 percent effective or has serious side effects in 1 percent of the cases or when a poll finds that a candidate has support of 36 percent of voters? These are deep questions, related to the very meaning of the concept of randomness, a concept mathematicians still like to debate.

I recently engaged in such a discussion one warm spring day with a statistician visiting from Hebrew University, Moshe, who sat across the lunch table from me at Caltech. Between spoonfuls of nonfat yogurt, Moshe espoused the opinion that truly random numbers do not exist. “There is no such thing,” he said. “Oh, they publish charts and write computer programs, but they are just fooling themselves. No one has ever found a method of producing randomness that’s any better than throwing a die, and throwing a die just won’t do it.”

Moshe waved his white plastic spoon at me. He was agitated now. I felt a connection between his feelings about randomness and his religious convictions. Moshe is an Orthodox Jew, and I know that many religious people have problems thinking God can allow randomness to exist. “Suppose you want a string of N random numbers between 1 and 6,” he told me. “You throw a die N times and record the string of N numbers that comes up. Is that a random string?”

No, he claimed, because no one can make a perfect die. There will always be some faces that are favored and some that are disfavored. It might take 1,000 throws to notice the difference, or 1 billion, but eventually you will notice it. You’ll see more 4s than 6s or maybe fewer. Any artificial device is bound to suffer from that flaw, he said, because human beings do not have access to perfection. That may be, but Nature does, and truly random events do occur on the atomic level. In fact, that is the very basis of quantum theory, and so we spent the rest of our lunch in a discussion of quantum optics.

Today cutting-edge quantum generators produce truly random numbers from the toss of Nature’s perfect quantum dice. In the past the perfection necessary for randomness was indeed an elusive goal. One of the most creative approaches came from New York City’s Harlem crime syndicates around 1920.1 Needing a daily supply of five-digit random numbers for an illegal lottery, the racketeers thumbed their noses at the authorities by employing the last five digits of the U.S. Treasury balance. (At this writing the U.S. government is in debt by $8,995,800,515,946.50, or $29,679.02 per person, so today the racketeers could have obtained their five digits from the per capita debt!) Their so-called Treasury lottery ran afoul of not only criminal law, however, but also scientific law, for according to a rule called Benford’s law, numbers arising in this cumulative fashion are not random but rather are biased in favor of the lower digits.

Benford’s law was discovered not by a fellow named Benford but by the American astronomer Simon Newcomb. Around 1881, Newcomb noticed that the pages of books of logarithms that dealt with numbers beginning with the numeral 1 were dirtier and more frayed than the pages corresponding to numbers beginning with the numeral 2, and so on, down to the numeral 9, whose pages, in comparison, looked clean and new. Assuming that in the long run, wear was proportional to amount of use, Newcomb concluded from his observations that the scientists with whom he shared the book were working with data that reflected that distribution of digits. The law’s current name arose after Frank Benford noticed the same thing, in 1938, when scrutinizing the log tables at the General Electric Research Laboratory in Schenectady, New York. But neither man proved the law. That didn’t happen until 1995, in work by Ted Hill, a mathematician at the Georgia Institute of Technology.

According to Benford’s law, rather than all nine digits’ appearing with equal frequency, the number 1 should appear as the first digit in data about 30 percent of the time; the digit 2, about 18 percent of the time; and so on, down to the digit 9, which should appear as the first digit about 5 percent of the time. A similar law, though less pronounced, applies to later digits. Many types of data obey Benford’s law, in particular, financial data. In fact, the law seems tailor-made for mining large amounts of financial data in search of fraud.

One famous application involved a young entrepreneur named Kevin Lawrence, who raised $91 million to create a chain of high-tech health clubs.2 Engorged with cash, Lawrence raced into action, hiring a bevy of executives and spending his investors’ money as quickly as he had raised it. That would have been fine except for one detail: he and his cohorts were spending most of the money not on the business but on personal items. And since several homes, twenty personal watercraft, forty-seven cars (including five Hummers, four Ferraris, three Dodge Vipers, two DeTomaso Panteras, and a Lamborghini Diablo), two Rolex watches, a twenty-one-carat diamond bracelet, a $200,000 samurai sword, and a commercial-grade cotton candy machine would have been difficult to explain as necessary business expenditures, Lawrence and his pals tried to cover their tracks by moving investors’ money through a complex web of bank accounts and shell companies to give the appearance of a bustling and growing business. Unfortunately for them, a suspicious forensic accountant named Darrell Dorrell compiled a list of over 70,000 numbers representing their various checks and wire transfers and compared the distribution of digits with Benford’s law. The numbers failed the test.3 That, of course, was only the beginning of the investigation, but from there the saga unfolded predictably, ending the day before Thanksgiving 2003, when, flanked by his attorneys and clad in light blue prison garb, Kevin Lawrence was sentenced to twenty years without possibility of parole. The IRS has also studied Benford’s law as a way to identify tax cheats. One researcher even applied the law to thirteen years of Bill Clinton’s tax returns. They passed the test.4

Presumably neither the Harlem syndicate nor its customers noticed these regularities in their lottery numbers. But had people like Newcomb, Benford, or Hill played their lottery, in principle they could have used Benford’s law to make favorable bets, earning a nice supplement to their scholar’s salary.

In 1947, scientists at the Rand Corporation needed a large table of random digits for a more admirable purpose: to help find approximate solutions to certain mathematical equations employing a technique aptly named the Monte Carlo method. To generate the digits, they employed electronically generated noise, a kind of electronic roulette wheel. Is electronic noise random? That is a question as subtle as the definition of randomness itself.

In 1896 the American philosopher Charles Sanders Peirce wrote that a random sample is one “taken according to a precept or method which, being applied over and over again indefinitely, would in the long run result in the drawing of any one of a set of instances as often as any other set of the same number.”5 That is called the frequency interpretation of randomness. The main alternative to it is called the subjective interpretation. Whereas in the frequency interpretation you judge a sample by the way it turned out, in the subjective interpretation you judge a sample by the way it is produced. According to the subjective interpretation, a number or set of numbers is considered random if we either don’t know or cannot predict how the process that produces it will turn out.

The difference between the two interpretations is more nuanced than it may seem. For example, in a perfect world a throw of a die would be random by the first definition but not by the second, since all faces would be equally probable but we could (in a perfect world) employ our exact knowledge of the physical conditions and the laws of physics to determine before each throw exactly how the die will land. In the imperfect real world, however, a throw of a die is random according to the second definition but not the first. That’s because, as Moshe pointed out, owing to its imperfections, a die will not land on each face with equal frequency; nevertheless, because of our limitations we have no prior knowledge about any face being favored over any other.

In order to decide whether their table was random, the Rand scientists subjected it to various tests. Upon closer inspection, their system was shown to have biases, just like Moshe’s archetypally imperfect dice.6 The Rand scientists made some refinements to their system but never managed to completely banish the regularities. As Moshe said, complete chaos is ironically a kind of perfection. Still, the Rand numbers proved random enough to be useful, and the company published them in 1955 under the catchy title A Million Random Digits.

In their research the Rand scientists ran into a roulette-wheel problem that had been discovered, in some abstract way, almost a century earlier by an Englishman named Joseph Jagger.7 Jagger was an engineer and a mechanic in a cotton factory in Yorkshire, and so he had an intuitive feel for the capabilities—and the shortcomings—of machinery and one day in 1873 turned his intuition and fertile mind from cotton to cash. How perfectly, he wondered, can the roulette wheels in Monte Carlo really work?

The roulette wheel—invented, at least according to legend, by Blaise Pascal as he was tinkering with an idea for a perpetual-motion machine—is basically a large bowl with partitions (called frets) that are shaped like thin slices of pie. When the wheel is spun, a marble first bounces along the rim of the bowl but eventually comes to rest in one of the compartments, which are numbered 1 through 36, plus 0 (and 00 on American roulette wheels). The bettor’s job is simple: to guess in which compartment the marble will land. The existence of roulette wheels is pretty good evidence that legitimate psychics don’t exist, for in Monte Carlo if you bet $1 on a compartment and the marble lands there, the house pays you $35 (plus your initial dollar). If psychics really existed, you’d see them in places like that, hooting and dancing and pushing wheelbarrows of cash down the street, and not on Web sites calling themselves Zelda Who Knows All and Sees All and offering twenty-four-hour free online love advice in competition with about 1.2 million other Web psychics (according to Google). For me both the future and, increasingly, the past unfortunately appear obscured by a thick fog. But I do know one thing: my chances of losing at European roulette are 36 out of 37; my chances of winning, 1 out of 37. That means that for every $1 I bet, the casino stands to win (36/37 × $1) - (1/37 × $35). That comes to 1/37 of a dollar, or about 2.7¢. Depending on my state of mind, it’s either the price I pay for the enjoyment of watching a little marble bounce around a big shiny wheel or else the price I pay for the opportunity of having lightning strike me (in a good way). At least that is how it is supposed to work.

But does it? Only if the roulette wheels are perfectly balanced, thought Jagger, and he had worked with enough machines to share Moshe’s point of view. He was willing to bet they weren’t. So he gathered his savings, traveled to Monte Carlo, and hired six assistants, one for each of the casino’s six roulette wheels. Every day his assistants observed the wheels, writing down every number that came up in the twelve hours the casino was open. Every night, back in his hotel room, Jagger analyzed the numbers. After six days, he had not detected any bias in five of the wheels, but on the sixth wheel nine numbers came up noticeably more often than the others. And so on the seventh day he headed to the casino and started to bet heavily on the nine favored numbers: 7, 8, 9, 17, 18, 19, 22, 28, and 29.

When the casino shut that night, Jagger was up $70,000. His winnings did not go without notice. Other patrons swarmed his table, tossing down their own cash to get in on a good thing. And casino inspectors were all over him, trying to decipher his system or, better, catch him cheating. By the fourth day of betting, Jagger had amassed $300,000, and the casino’s managers were desperate to get rid of the mystery guy, or at least thwart his scheme. One imagines this being accomplished by a burly fellow from Brooklyn. Actually the casino employees did something far more clever.

On the fifth day, Jagger began to lose. His losing, like his winning, was not something you could spot immediately. Both before and after the casino’s trick, he would win some and lose some, only now he lost more often than he won instead of the other way around. With the casino’s small margin, it would take some pretty diligent betting to drain Jagger’s funds, but after four days of sucking in casino money, he wasn’t about to let up on the straw. By the time his change of luck deterred him, Jagger had lost half his fortune. One may imagine that by then his mood—not to mention the mood of his hangers-on—was sour. How could his scheme have suddenly failed?

Jagger at last made an astute observation. In the dozens of hours he had spent winning, he had come to notice a tiny scratch on the roulette wheel. This scratch was now absent. Had the casino kindly touched it up so that he could drive them to bankruptcy in style? Jagger guessed not and checked the other roulette wheels. One of them had a scratch. The casino managers had correctly guessed that Jagger’s days of success were somehow related to the wheel he was playing, and so overnight they had switched wheels. Jagger relocated and again began to win. Soon he had pumped his winnings past where they had been, to almost half a million.

Unfortunately for Jagger, the casino’s managers, finally zeroing in on his scheme, found a new way to thwart him. They decided to move the frets each night after closing, turning them along the wheel so that each day the wheel’s imbalance would favor different numbers, numbers unknown to Jagger. Jagger started losing again and finally quit. His gambling career over, he left Monte Carlo with $325,000 in hand, about $5 million in today’s dollars. Back home, he left his job at the mill and invested his money in real estate.

It may appear that Jagger’s scheme had been a sure thing, but it wasn’t. For even a perfectly balanced wheel will not come up on 0, 1, 2, 3, and so on, with exactly equal frequencies, as if the numbers in the lead would politely wait for the laggards to catch up. Instead, some numbers are bound to come up more often than average and others less often. And so even after six days of observations, there remained a chance that Jagger was wrong. The higher frequencies he observed for certain numbers may have arisen by chance and may not have reflected higher probabilities. That means that Jagger, too, had to face the question we raised at the start of this chapter: given a set of underlying probabilities, how closely can you expect your observations of a system to conform to those probabilities? Just as Pascal’s work was done in the new climate of (the scientific) revolution, so this question would be answered in the midst of a revolution, this one in mathematics—the invention of calculus.

IN 1680 a great comet sailed through our neighborhood of the solar system, close enough that the tiny fraction of sunlight it reflected was sufficient to make it prominent in the night sky of our own planet. It was in that part of earth’s orbit called November that the comet was first spotted, and for months afterward it remained an object of intense scrutiny, its path recorded in great detail. In 1687, Isaac Newton would use these data as an example of his inverse square law of gravity at work. And on one clear night in that parcel of land called Basel, Switzerland, another man destined for greatness was also paying attention. He was a young theologian who, gazing at the bright, hazy light of the comet, realized that it was mathematics, not the church, with which he wanted to occupy his life.8 With that realization sprouted not just Jakob Bernoulli’s own career change but also what would become the greatest family tree in the history of mathematics: in the century and a half between Jakob’s birth and 1800 the Bernoulli family produced a great many offspring, about half of whom were gifted, including eight noted mathematicians, and three (Jakob, his younger brother Johann, and Johann’s son Daniel) who are today counted as among the greatest mathematicians of all times.

Comets at the time were considered by theologians and the general public alike as a sign of divine anger, and God must have seemed pretty pissed off to create this one—it occupied more than half the visible sky. One preacher called it a “heavenly warning of the Allpowerful and Holy God written and placed before the powerless and unholy children of men.” It portended, he wrote, “a noteworthy change in spirit or in worldly matters” for their country or town.9 Jakob Bernoulli had another point of view. In 1681 he published a pamphlet titled Newly Discovered Method of How the Path of a Comet or Tailed Star Can Be Reduced to Certain Fundamental Laws, and Its Appearance Predicted.

Bernoulli had scooped Newton on the comet by six years. At least he would have scooped him had his theory been correct. It wasn’t, but claiming publicly that comets follow natural law and not God’s whim was a gutsy thing to do, especially given that the prior year—almost fifty years after Galileo’s condemnation—the professor of mathematics at the University of Basel, Peter Megerlin, had been roundly attacked by theologians for accepting the Copernican system and had been banned from teaching it at the university. A forbidding schism lay between the mathematician-scientists and the theologians in Basel, and Bernoulli was parking himself squarely on the side of the scientists.

Bernoulli’s talent soon brought the embrace of the mathematics community, and when Megerlin died, in late 1686, Bernoulli succeeded him as professor of mathematics. By then Bernoulli was working on problems connected with games of chance. One of his major influences was a Dutch mathematician and scientist, Christiaan Huygens, who in addition to improving the telescope, being the first to understand Saturn’s rings, creating the first pendulum clock (based on Galileo’s ideas), and helping to develop the wave theory of light, had written a mathematical primer on probability inspired by the ideas of Pascal and Fermat.

For Bernoulli, Huygens’s book was an inspiration. And yet he saw in the theory Huygens presented severe limitations. It might be sufficient for games of chance, but what about aspects of life that are more subjective? How can you assign a definite probability to the credibility of legal testimony? Or to who was the better golfer, Charles I of England or Mary, Queen of Scots? (Both were keen golfers.) Bernoulli believed that for rational decision making to be possible, there must be a reliable and mathematical way to determine probabilities. His view reflected the culture of the times, in which to conduct one’s affairs in a manner that was consistent with probabilistic expectation was considered the mark of a reasonable person. But it was not just subjectivity that, in Bernoulli’s opinion, limited the old theory of randomness. He also recognized that the theory was not designed for situations of ignorance, in which the probabilities of various outcomes could be defined in principle but in practice were not known. It is the issue I discussed with Moshe and that Jagger had to address: What are the odds that an imperfect die will come up with a 6? What are your chances of contracting the plague? What is the probability that your breastplate can withstand a thrust from your opponent’s long sword? In both subjective and uncertain situations, Bernoulli believed it would be “insanity” to expect to have the sort of prior, or a priori, knowledge of probabilities envisioned in Huygens’s book.10

Bernoulli saw the answer in the same terms that Jagger later would: instead of depending on probabilities being handed to us, we should discern them through observation. Being a mathematician, he sought to make the idea precise. Given that you view a certain number of roulette spins, how closely can you nail down the underlying probabilities, and with what level of confidence? We’ll return to those questions in the next chapter, but they are not quite the questions Bernoulli was able to answer. Instead, he answered a closely related question: how well are underlying probabilities reflected in actual results? Bernoulli considered it obvious that we are justified in expecting that as we increase the number of trials, the observed frequencies will reflect—more and more accurately—their underlying probabilities. He certainly wasn’t the first to believe that. But he was the first to give the issue a formal treatment, to turn the idea into a proof, and to quantify it, asking how many trials are necessary, and how sure can we be. He was also among the first to appreciate the importance of the new subject of calculus in addressing these issues.

THE YEAR Bernoulli was named professor in Basel proved to be a milestone year in the history of mathematics: it was the year in which Gottfried Leibniz published his revolutionary paper laying out the principles of integral calculus, the complement to his 1684 paper on differential calculus. Newton would publish his own version of the subject in 1687, in his Philosophiae Naturalis Principia Mathematica, or Mathematical Principles of Natural Philosophy, often referred to simply as Principia. These advances would hold the key to Bernoulli’s work on randomness.

By the time they published, both Leibniz and Newton had worked on the subject for years, but their almost simultaneous publications begged for controversy over who should be credited for the idea. The great mathematician Karl Pearson (whom we shall encounter again in chapter 8) said that the reputation of mathematicians “stands for posterity largely not on what they did, but on what their contemporaries attributed to them.”11 Perhaps Newton and Leibniz would have agreed with that. In any case neither was above a good fight, and the one that ensued was famously bitter. At the time the outcome was mixed. The Germans and Swiss learned their calculus from Leibniz’s work, and the English and many of the French from Newton’s. From the modern standpoint there is very little difference between the two, but in the long run Newton’s contribution is often emphasized because he appears to have truly had the idea earlier and because in Principia he employed his invention in the creation of modern physics, making Principia probably the greatest scientific book ever written. Leibniz, though, had developed a better notation, and it is his symbols that are often used in calculus today.

Neither man’s publications were easy to follow. In addition to being the greatest book on science, Newton’s Principia has also been called “one of the most inaccessible books ever written.”12 And Leibniz’s work, according to one of Jakob Bernoulli’s biographers, was “understood by no one” it was not only unclear but also full of misprints. Jakob’s brother Johann called it “an enigma rather than an explanation.”13 In fact, so incomprehensible were both works that scholars have speculated that both authors might have intentionally made their works difficult to understand to keep amateurs from dabbling. This enigmatic quality was an advantage for Jakob Bernoulli, though, for it did separate the wheat from the chaff, and his intellect fell into the former category. Hence once he had deciphered Leibniz’s ideas, he possessed a weapon shared by only a handful of others in the entire world, and with it he could easily solve problems that were exceedingly difficult for others to attempt.

The set of concepts central to both calculus and Bernoulli’s work is that of sequence, series, and limit. The term sequence means much the same thing to a mathematician as it does to anybody else: an ordered succession of elements, such as points or numbers. A series is simply the sum of a sequence of numbers. And loosely speaking, if the elements of a sequence seem to be heading somewhere—toward a particular endpoint or a particular number—then that is called the limit of the sequence.

Though calculus represents a new sophistication in the understanding of sequences, that idea, like so many others, had already been familiar to the Greeks. In the fifth century B.C., in fact, the Greek philosopher Zeno employed a curious sequence to formulate a paradox that is still debated among college philosophy students today, especially after a few beers. Zeno’s paradox goes like this: Suppose a student wishes to step to the door, which is 1 meter away. (We choose a meter here for convenience, but the same argument holds for a mile or any other measure.) Before she arrives there, she first must arrive at the halfway point. But in order to reach the halfway point, she must first arrive halfway to the halfway point—that is, at the one-quarter-way point. And so on, ad infinitum. In other words, in order to reach her destination, she must travel this sequence of distances: 1/2 meter, 1/4 meter, 1/8 meter, 1/16 meter, and so on. Zeno argued that because the sequence goes on forever, she has to traverse an infinite number of finite distances. That, Zeno said, must take an infinite amount of time. Zeno’s conclusion: you can never get anywhere.

Over the centuries, philosophers from Aristotle to Kant have debated this quandary. Diogenes the Cynic took the empirical approach: he simply walked a few steps, then pointed out that things in fact do move. To those of us who aren’t students of philosophy, that probably sounds like a pretty good answer. But it wouldn’t have impressed Zeno. Zeno was aware of the clash between his logical proof and the evidence of his senses; it’s just that, unlike Diogenes, what Zeno trusted was logic. And Zeno wasn’t just spinning his wheels. Even Diogenes would have had to admit that his response leaves us facing a puzzling (and, it turns out, deep) question: if our sensory evidence is correct, then what is wrong with Zeno’s logic?

Consider the sequence of distances in Zeno’s paradox: 1/2 meter, 1/4 meter, 1/8 meter, 1/16 meter, and so on (the increments growing ever smaller). This sequence has an infinite number of terms, so we cannot compute its sum by simply adding them all up. But we can notice that although the number of terms is infinite, those terms get successively smaller. Might there be a finite balance between the endless stream of terms and their endlessly diminishing size? That is precisely the kind of question we can address by employing the concepts of sequence, series, and limit. To see how it works, instead of trying to calculate how far the student went after the entire infinity of Zeno’s intervals, let’s take one interval at a time. Here are the student’s distances after the first few intervals:

After the first interval: 1/2 meter

After the second interval: 1/2 meter + 1/4 meter = 3/4 meter

After the third interval: 1/2 meter + 1/4 meter + 1/8 meter = 7/8 meter

After the fourth interval: 1/2 meter + 1/4 meter + 1/8 meter + 1/16 meter = 15/16 meter

There is a pattern in these numbers: 1/2 meter, 3/4 meter, 7/8 meter, 15/16 meter…The denominator is a power of two, and the numerator is one less than the denominator. We might guess from this pattern that after 10 intervals the student would have traveled 1,023/1,024 meter; after 20 intervals, 1,048,575/1,048,576 meter; and so on. The pattern makes it clear that Zeno is correct that the more intervals we include, the greater the sum of distances we obtain. But Zeno is not correct when he says that the sum is headed for infinity. Instead, the numbers seem to be approaching 1; or as a mathematician would say, 1 meter is the limit of this sequence of distances. That makes sense, because although Zeno chopped her trip into an infinite number of intervals, she had, after all, set out to travel just 1 meter.

Zeno’s paradox concerns the amount of time it takes to make the journey, not the distance covered. If the student were forced to take individual steps to cover each of Zeno’s intervals, she would indeed be in some time trouble (not to mention her having to overcome the difficulty of taking submillimeter steps)! But if she is allowed to move at constant speed without pausing at Zeno’s imaginary checkpoints—and why not?—then the time it takes to travel each of Zeno’s intervals is proportional to the distance covered in that interval, and so since the total distance is finite, as is the total time—and fortunately for all of us—motion is possible after all.

Though the modern concept of limits wasn’t worked out until long after Zeno’s life, and even Bernoulli’s—it came in the nineteenth century14—it is this concept that informs the spirit of calculus, and it is in this spirit that Jakob Bernoulli attacked the relationship between probabilities and observation. In particular, Bernoulli investigated what happens in the limit of an arbitrarily large number of repeated observations. Toss a (balanced) coin 10 times and you might observe 7 heads, but toss it 1 zillion times and you’ll most likely get very near 50 percent. In the 1940s a South African mathematician named John Kerrich decided to test this out in a practical experiment, tossing a coin what must have seemed like 1 zillion times—actually it was 10,000—and recording the results of each toss.15 You might think Kerrich would have had better things to do, but he was a war prisoner at the time, having had the bad luck of being a visitor in Copenhagen when the Germans invaded Denmark in April 1940. According to Kerrich’s data, after 100 throws he had only 44 percent heads, but by the time he reached 10,000, the number was much closer to half: 50.67 percent. How do you quantify this phenomenon? The answer to that question was Bernoulli’s accomplishment.

According to the historian and philosopher of science Ian Hacking, Bernoulli’s work “came before the public with a brilliant portent of all the things we know about it now; its mathematical profundity, its unbounded practical applications, its squirming duality and its constant invitation for philosophizing. Probability had fully emerged.” In Bernoulli’s more modest words, his study proved to be one of “novelty, as well as…high utility.” It was also an effort, Bernoulli wrote, of “grave difficulty.”16 He worked on it for twenty years.

JAKOB BERNOULLI called the high point of his twenty-year effort his “golden theorem.” Modern versions of it that differ in their technical nuance go by various names: Bernoulli’s theorem, the law of large numbers, and the weak law of large numbers. The phrase law of large numbers is employed because, as we’ve said, Bernoulli’s theorem concerns the way results reflect underlying probabilities when we make a large number of observations. But we’ll stick with Bernoulli’s terminology and call his theorem the golden theorem because we will be discussing it in its original form.17

Although Bernoulli’s interest lay in real-world applications, some of his favorite examples involved an item not found in most households: an urn filled with colored pebbles. In one scenario, he envisioned the urn holding 3,000 white pebbles and 2,000 black ones, a ratio of 60 percent white to 40 percent black. In this example you conduct a series of blind drawings from the urn “with replacement”—that is, replacing each pebble before drawing the next in order not to alter the 3:2 ratio. The a priori chances of drawing a white pebble are then 3 out of 5, or 60 percent, and so in this example Bernoulli’s central question becomes, how strictly should you expect the proportion of white pebbles drawn to hew to the 60 percent ratio, and with what probability?

The urn example is a good one because the same mathematics that describes drawing pebbles from an urn can be employed to describe any series of trials in which each trial has two possible outcomes, as long as those outcomes are random and the trials are independent of each other. Today such trials are called Bernoulli trials, and a series of Bernoulli trials is a Bernoulli process. When a random trial has two possible outcomes, one is often arbitrarily labeled “success” and the other “failure.” The labeling is not meant to be literal and sometimes has nothing to do with the everyday meaning of the words—say, in the sense that if you can’t wait to read on, this book is a success, and if you are using this book to keep yourself and your sweetheart warm after the logs burned down, it is a failure. Flipping a coin, deciding to vote for candidate A or candidate B, giving birth to a boy or girl, buying or not buying a product, being cured or not being cured, even dying or living are examples of Bernoulli trials. Actions that have multiple outcomes can also be modeled as Bernoulli trials if the question you are asking can be phrased in a way that has a yes or no answer, such as “Did the die land on the number 4?” or “Is there any ice left on the North Pole?” And so, although Bernoulli wrote about pebbles and urns, all his examples apply equally to these and many other analogous situations.

With that understanding we return to the urn, 60 percent of whose pebbles are white. If you draw 100 pebbles from the urn (with replacement), you might find that exactly 60 of them are white, but you might also draw just 50 white pebbles or 59. What are the chances that you will draw between 58 percent and 62 percent white pebbles? What are the chances you’ll draw between 59 percent and 61 percent? How much more confident can you be if instead of 100, you draw 1,000 pebbles or 1 million? You can never be 100 percent certain, but can you draw enough pebbles to make the chances 99.9999 percent certain that you will draw, say, between 59.9 percent and 60.1 percent white pebbles? Bernoulli’s golden theorem addresses questions such as these.

In order to apply the golden theorem, you must make two choices. First, you must specify your tolerance of error. How near to the underlying proportion of 60 percent are you demanding that your series of trials come? You must choose an interval, such as plus or minus 1 percent or 2 percent or 0.00001 percent. Second, you must specify your tolerance of uncertainty. You can never be 100 percent sure a trial will yield the result you are aiming for, but you can ensure that you will get a satisfactory result 99 times out of 100 or 999 out of 1,000.

The golden theorem tells you that it is always possible to draw enough pebbles to be almost certain that the percentage of white pebbles you draw will be near 60 percent no matter how demanding you want to be in your personal definition of almost certain and near. It also gives a numerical formula for calculating the number of trials that are “enough,” given those definitions.

The first part of the law was a conceptual triumph, and it is the only part that survives in modern versions of the theorem. Concerning the second part—Bernoulli’s formula—it is important to understand that although the golden theorem specifies a number of trials that is sufficient to meet your goals of confidence and accuracy, it does not say you can’t accomplish those goals with fewer trials. That doesn’t affect the first part of the theorem, for which it is enough to know simply that the number of trials specified is finite. But Bernoulli also intended the number given by his formula to be of practical use. Unfortunately, in most practical applications it isn’t. For instance, here is a numerical example Bernoulli worked out himself, although I have changed the context: Suppose 60 percent of the voters in Basel support the mayor. How many people must you poll for the chances to be 99.9 percent that you will find the mayor’s support to be between 58 percent and 62 percent—that is, for the result to be accurate within plus or minus 2 percent? (Assume, in order to be consistent with Bernoulli, that the people polled are chosen at random, but with replacement. In other words, it is possible that you poll a person more than once.) The answer is 25,550, which in Bernoulli’s time was roughly the entire population of Basel. That this number was impractical wasn’t lost on Bernoulli. He also knew that accomplished gamblers can intuitively guess their chances of success at a new game based on a sample of far fewer than thousands of trial games.

One reason Bernoulli’s numerical estimate was so far from optimal was that his proof was based on many approximations. Another reason was that he chose 99.9 percent as his standard of certainty—that is, he required that he get the wrong answer (an answer that differed more than 2 percent from the true one) less than 1 time in 1,000. That is a very demanding standard. Bernoulli called it moral certainty, meaning the degree of certainty he thought a reasonable person would require in order to make a rational decision. It is perhaps a measure of how much the times have changed that today we’ve abandoned the notion of moral certainty in favor of the one we encountered in the last chapter, statistical significance, meaning that your answer will be wrong less than 1 time in 20.

With today’s mathematical methods, statisticians have shown that in a poll like the one I described, you can achieve a statistically significant result with an accuracy of plus or minus 5 percent by polling only 370 subjects. And if you poll 1,000, you can achieve a 90 percent chance of coming within 2 percent of the true result (60 percent approval of Basel’s mayor). But despite its limitations, Bernoulli’s golden theorem was a milestone because it showed, at least in principle, that a large enough sample will almost certainly reflect the underlying makeup of the population being sampled.

image

IN REAL LIFE we don’t often get to observe anyone’s or anything’s performance over thousands of trials. And so if Bernoulli required an overly strict standard of certainty, in real-life situations we often make the opposite error: we assume that a sample or a series of trials is representative of the underlying situation when it is actually far too small to be reliable. For instance, if you polled exactly 5 residents of Basel in Bernoulli’s day, a calculation like the ones we discussed in chapter 4 shows that the chances are only about 1 in 3 that you will find that 60 percent of the sample (3 people) supported the mayor.

Only 1 in 3? Shouldn’t the true percentage of the mayor’s supporters be the most probable outcome when you poll a sample of voters? In fact, 1 in 3 is the most probable outcome: the odds of finding 0, 1, 2, 4, or 5 supporters are lower than the odds of finding 3. Nevertheless, finding 3 supporters is not likely: because there are so many of those nonrepresentative possibilities, their combined odds add up to twice the odds that your poll accurately reflects the population. And so in a poll of 5 voters, 2 times out of 3 you will observe the “wrong” percentage. In fact, about 1 in 10 times you’ll find that all the voters you polled agree on whether they like or dislike the mayor. And so if you paid any attention to a sample of 5, you’d probably severely over- or underestimate the mayor’s true popularity.

The misconception—or the mistaken intuition—that a small sample accurately reflects underlying probabilities is so widespread that Kahneman and Tversky gave it a name: the law of small numbers.18 The law of small numbers is not really a law. It is a sarcastic name describing the misguided attempt to apply the law of large numbers when the numbers aren’t large.

If people applied the (untrue) law of small numbers only to urns, there wouldn’t be much impact, but as we’ve said, many events in life are Bernoulli processes, and so our intuition often leads us to misinterpret what we observe. That is why, as I described in chapter 1, when people observe the handful of more successful or less successful years achieved by the Sherry Lansings and Mark Cantons of the world, they assume that their past performance accurately predicts their future performance.

Let’s apply these ideas to an example I mentioned briefly in chapter 4: the situation in which two companies compete head-to-head or two employees within a company compete. Think now of the CEOs of the Fortune 500 companies. Let’s assume that, based on their knowledge and abilities, each CEO has a certain probability of success each year (however his or her company may define that). And to make things simple, let’s assume that for these CEOs successful years occur with the same frequency as the white pebbles or the mayor’s supporters: 60 percent. (Whether the true number is a little higher or a little lower doesn’t affect the thrust of this argument.) Does that mean we should expect, in a given five-year period, that a CEO will have precisely three good years?

No. As the earlier analysis showed, even if the CEOs all have a nice cut-and-dried 60 percent success rate, the chances that in a given five-year period a particular CEO’s performance will reflect that underlying rate are only 1 in 3! Translated to the Fortune 500, that means that over the past five years about 333 of the CEOs would have exhibited performance that did not reflect their true ability. Moreover, we should expect, by chance alone, about 1 in 10 of the CEOs to have five winning or losing years in a row. What does this tell us? It is more reliable to judge people by analyzing their abilities than by glancing at the scoreboard. Or as Bernoulli put it, “One should not appraise human action on the basis of its results.”19

Going against the law of small numbers requires character. For while anyone can sit back and point to the bottom line as justification, assessing instead a person’s actual knowledge and actual ability takes confidence, thought, good judgment, and, well, guts. You can’t just stand up in a meeting with your colleagues and yell, “Don’t fire her. She was just on the wrong end of a Bernoulli series.” Nor is it likely to win you friends if you stand up and say of the gloating fellow who just sold more Toyota Camrys than anyone else in the history of the dealership, “It was just a random fluctuation.” And so it rarely happens. Executives’ winning years are attributed to their brilliance, explained retroactively through incisive hindsight. And when people don’t succeed, we often assume the failure accurately reflects the proportion with which their talents and their abilities fill the urn.

Another mistaken notion connected with the law of large numbers is the idea that an event is more or less likely to occur because it has or has not happened recently. The idea that the odds of an event with a fixed probability increase or decrease depending on recent occurrences of the event is called the gambler’s fallacy. For example, if Kerrich landed, say, 44 heads in the first 100 tosses, the coin would not develop a bias toward tails in order to catch up! That’s what is at the root of such ideas as “her luck has run out” and “He is due.” That does not happen. For what it’s worth, a good streak doesn’t jinx you, and a bad one, unfortunately, does not mean better luck is in store. Still, the gambler’s fallacy affects more people than you might think, if not on a conscious level then on an unconscious one. People expect good luck to follow bad luck, or they worry that bad will follow good.

I remember, on a cruise a few years back, watching an intense pudgy man sweating as he frantically fed dollars into a slot machine as fast as it would take them. His companion, seeing me eye them, remarked simply, “He is due.” Although tempted to point out that, no, he isn’t due, I instead walked on. After several steps I halted my progress owing to a sudden flashing of lights, ringing of bells, not a little hooting on the couple’s part, and the sound of, for what seemed like minutes, a fast stream of dollar coins flying out of the machine’s chute. Now I know that a modern slot machine is computerized, its payoffs driven by a random-number generator, which by both law and regulation must truly generate, as advertised, random numbers, making each pull of the handle completely independent of the history of previous pulls. And yet…Well, let’s just say the gambler’s fallacy is a powerful illusion.

THE MANUSCRIPT in which Bernoulli presented his golden theorem ends abruptly even though he promises earlier in the work that he will provide applications to various issues in civic affairs and economics. It is as if “Bernoulli literally quit when he saw the number 25,550,” wrote the historian of statistics Stephen Stigler.20 In fact, Bernoulli was in the process of publishing his manuscript when he died “of a slow fever” in August 1705, at the age of fifty. His publishers asked Johann Bernoulli to complete it, but Johann refused, saying he was too busy. That may appear odd, but the Bernoullis were an odd family. If you were asked to choose the most unpleasant mathematician who ever lived, you wouldn’t be too far off if you fingered Johann Bernoulli. He has been variously described in historical texts as jealous, vain, thin-skinned, stubborn, bilious, boastful, dishonest, and a consummate liar. He accomplished much in mathematics, but he is also known for having his son Daniel tossed out of the Académie des Sciences after Daniel won a prize for which Johann himself had competed, for attempting to steal both his brother’s and Leibniz’s ideas, and for plagiarizing Daniel’s book on hydrodynamics and then faking the publication date so that his book would appear to have been published first.

When he was asked to complete his late brother’s manuscript, he had recently relocated to Basel from the University of Groningen, in the Netherlands, obtaining a post not in mathematics but as a professor of Greek. Jakob had found this career change suspicious, especially since in his estimation Johann did not know Greek. What Jakob suspected, he wrote Leibniz, was that Johann had come to Basel to usurp Jakob’s position. And, indeed, upon Jakob’s death, Johann did obtain it.

Johann and Jakob had not gotten along for most of their adult lives. They would regularly trade insults in mathematics publications and in letters that, one mathematician wrote, “bristle with strong language that is usually reserved for horse thieves.”21 And so when the need arose to edit Jakob’s posthumous manuscript, the task fell further down the food chain, to Jakob’s nephew Nikolaus, the son of one of Jakob’s other brothers, also named Nikolaus. The younger Nikolaus was only eighteen at the time, but he had been one of Jakob’s pupils. Unfortunately he didn’t feel up to the task, possibly in part because he was aware of Leibniz’s opposition to his uncle’s ideas about applications of the theory. And so the manuscript lay dormant for eight years. The book was finally published in 1713 under the title Ars conjectandi, or The Art of Conjecture. Like Pascal’s Pensées, it is still in print.

Jakob Bernoulli had shown that through mathematical analysis one could learn how the inner hidden probabilities that underlie natural systems are reflected in the data those systems produce. As for the question that Bernoulli did not answer—the question of how to infer, from the data produced, the underlying probability of events—the answer would not come for several decades more.