From Eternity to Here: The Quest for the Ultimate Theory of Time - Sean Carroll (2010)



Nobody can imagine in physical terms the act of reversing the order of time. Time is not reversible.

—Vladimir Nabokov, Look at the Harlequins!

Why is it that discussions of entropy and the Second Law of Thermodynamics so often end up being about food? Here are some popular (and tasty) examples of the increase of entropy in irreversible processes:

• Breaking eggs and scrambling them.

• Stirring milk into coffee.

• Spilling wine on a new carpet.

• The diffusion of the aroma of a freshly baked pie into a room.

• Ice cubes melting in a glass of water.

To be fair, not all of these are equally appetizing; the ice-cube example is kind of bland, unless you replace the water with gin. Furthermore, I should come clean about the scrambled-eggs story. The truth is that the act of cooking the eggs in your skillet isn’t a straightforward demonstration of the Second Law; the cooking is a chemical reaction that is caused by the introduction of heat, which wouldn’t happen if the eggs weren’t an open system. Entropy comes into play when we break the eggs and whisk the yolks together with the whites; the point of cooking the resulting mixture is to avoid salmonella poisoning, not to illustrate thermodynamics.

The relationship between entropy and food arises largely from the ubiquity of mixing. In the kitchen, we are often interested in combining together two things that had been kept separate—either two different forms of the same substance (ice and liquid water) or two altogether different ingredients (milk and coffee, egg whites and yolks). The original nineteenth-century thermodynamicists were extremely interested in the dynamics of heat, and the melting ice cube would have been of foremost concern to them; they would have been less fascinated by processes where all the ingredients were at the same temperature, such as spilling wine onto a carpet. But clearly there is some underlying similarity in what is going on; an initial state in which substances are kept separate evolves into a final state in which they are mixed together. It’s easy to mix things and hard to unmix them—the arrow of time looms over everything we do in the kitchen.

Why is mixing easy and unmixing hard? When we mix two liquids, we see them swirl together and gradually blend into a uniform texture. By itself, that process doesn’t offer much clue into what is really going on. So instead let’s visualize what happens when we mix together two different kinds of colored sand. The important thing about sand is that it’s clearly made of discrete units, the individual grains. When we mix together, for example, blue sand and red sand, the mixture as a whole begins to look purple. But it’s not that the individual grains turn purple; they maintain their identities, while the blue grains and the red grains become jumbled together. It’s only when we look from afar (“macroscopically”) that it makes sense to think of the mixture as being purple; when we peer closely at the sand (“microscopically”) we see individual blue and red grains.

The great insight of the pioneers of kinetic theory—Daniel Bernoulli in Swit zerland, Rudolf Clausius in Germany, James Clerk Maxwell and William Thomson in Great Britain, Ludwig Boltzmann in Austria, and Josiah Willard Gibbs in the United States—was to understand all liquids and gases in the same way we think of sand: as collections of very tiny pieces with persistent identities. Instead of grains, of course, we think of liquids and gases as composed of atoms and molecules. But the principle is the same. When milk and coffee mix, the individual milk molecules don’t combine with the individual coffee molecules to make some new kind of molecule; the two sets of molecules simply intermingle. Even heat is a property of atoms and molecules, rather than constituting some kind of fluid in its own right—the heat contained in an object is a measure of the energy of the rapidly moving molecules within it. When an ice cube melts into a glass of water, the molecules remain the same, but they gradually bump into one another and distribute their energy evenly throughout the molecules in the glass.

Without (yet) being precise about the mathematical definition of “entropy,” the example of blending two kinds of colored sand illustrates why it is easier to mix things than to unmix them. Imagine a bowl of sand, with all of the blue grains on one side of the bowl and the red grains on the other. It’s pretty clear that this arrangement is somewhat delicate—if we disturb the bowl by shaking it or stirring with a spoon, the two colors will begin to mix together. If, on the other hand, we start with the two colors completely mixed, such an arrangement is robust—if we disturb the mixture, it will stay mixed. The reason is simple: To separate out two kinds of sand that are mixed together requires a much more precise operation than simply shaking or stirring. We would have to reach in carefully with tweezers and a magnifying glass to move all of the red grains to one side of the bowl and all of the blue grains to the other. It takes much more care to create the delicate unmixed state of sand than to create the robust mixed state.

That’s a point of view that can be made fearsomely quantitative and scientific, which is exactly what Boltzmann and others managed to do in the 1870s. We’re going to dig into the guts of what they did, and explore what it explains and what it doesn’t, and how it can be reconciled with underlying laws of physics that are perfectly reversible. But it should already be clear that a crucial role is played by the large numbers of atoms that we find in macroscopic objects in the real world. If we had only one grain of red sand and one grain of blue sand, there would be no distinction between “mixed” and “unmixed.” In the last chapter we discussed how the underlying laws of physics work equally well forward or backward in time (suitably defined). That’s a microscopic description, in which we keep careful track of each and every constituent of a system. But very often in the real world, where large numbers of atoms are involved, we don’t keep track of nearly that much information. Instead, we make simplifications—thinking about the average color or temperature or pressure, rather than the specific position and momentum of each atom. When we think macroscopically, we forget (or ignore) detailed information about every particle—and that’s where entropy and irreversibility begin to come into play.


The basic idea we want to understand is “how do macroscopic features of a system made of many atoms evolve as a consequence of the motion of the individual atoms?” (I’ll use “atoms” and “molecules” and “particles” more or less interchangeably, since all we care is that they are tiny things that obey reversible laws of physics, and that you need a lot of them to make something macroscopic.) In that spirit, consider a sealed box divided in two by a wall with a hole in it. Gas molecules can bounce around on one side of the box and will usually bounce right off the central wall, but every once in a while they will sneak through to the other side. We might imagine, for example, that the molecules bounce off the central wall 995 times out of 1,000, but one-half of 1 percent of the time (each second, let’s say) they find the hole and move to the other side.


Figure 41: A box of gas molecules, featuring a central partition with a hole. Every second, each molecule has a tiny chance to go through the hole to the other side.

This example is pleasingly specific; we can examine a particular instance in detail and see what happens.123 Every second, each molecule on the left side of the box has a 99.5 percent chance of staying on that side, and a 0.5 percent chance of moving to the other side; likewise for the right side of the box. This rule is perfectly time-reversal invariant; if you made a movie of the motion of just one particle obeying this rule, you couldn’t tell whether it was being run forward or backward in time. At the level of individual particles, we can’t distinguish the past from the future.

In Figure 42 we have portrayed one possible evolution of such a box; time moves upward, as always. The box has 2,000 “air molecules” in it, and starts at time t = 1 with 1,600 molecules on the left-hand side and only 400 on the right. (You’re not supposed to ask why it starts that way—although later, when we replace “the box” with “the universe,” we will start asking such questions.) It’s not very surprising what happens as we sit there and let the molecules bounce around inside the box. Every second, there is a small chance that any particular molecule will switch sides; but, because we started with a much larger number of molecules on the one side, there is a general tendency for the numbers to even out. (Exactly like temperature, in Clausius’s formulation of the Second Law.) When there are more molecules on the left, the total number of molecules that shift from left to right will usually be larger than the number that shift from right to left. So after 50 seconds we see that the numbers are beginning to equal out, and after 200 seconds the distribution is essentially equal.


Figure 42: Evolution of 2,000 molecules in a divided box of gas. We start with 1,600 molecules on the left, 400 on the right. After 50 seconds, there are about 1,400 on the left and 600 on the right; by the time 200 seconds have passed, the molecules are distributed equally between the two sides.

This box clearly displays an arrow of time. Even if we hadn’t labeled the different distributions in the figure with the specific times to which they corresponded, most people wouldn’t have any trouble guessing that the bottom box came first and the top box came last. We’re not surprised when the air molecules even themselves out, but we’d be very surprised if they spontaneously congregated all (or even mostly) on one side of the box. The past is the direction of time in which things were more segregated, while the future is the direction in which they have smoothed themselves out. It’s exactly the same thing that happens when a teaspoon of milk spreads out into a cup of coffee.

Of course, all of this is only statistical, not absolute. That is, it’s certainly possible that we could have started with an even distribution of molecules to the right and left, and just by chance a large number of them could jump to one side, leaving us with a very uneven distribution. As we’ll see, that’s unlikely, and it becomes more unlikely as we get more and more particles involved; but it’s something to keep in mind. For now, let’s ignore these very rare events and concentrate on the most likely evolution of the system.


We would like to do better than simply saying, “Yeah, it’s pretty obvious that the molecules will most likely move around until they are evenly distributed.” We want to be able to explain precisely why we have that expectation, and turn “evenly distributed” and “most likely” into rigorously quantitative statements. This is the subject matter of statistical mechanics. In the immortal words of Peter Venkman: “Back off, man, I’m a scientist.”

Boltzmann’s first major insight was the simple appreciation that there are more ways for the molecules to be (more or less) evenly distributed through the box than there are ways for them to be all huddled on the same side. Imagine that we had numbered the individual molecules, 1 through 2,000. We want to know how many ways we can arrange things so that there are a certain number of molecules on the left and a certain number on the right. For example, how many ways are there to arrange things so that all 2,000 molecules are on the left, and zero on the right? There is only one way. We’re just keeping track of whether each molecule is on the left or on the right, not any details about its specific position or momentum, so we simply put every molecule on the left side of the box.

But now let’s ask how many ways there are for there to be 1,999 molecules on the left and exactly 1 on the right. The answer is: 2,000 different ways—one for each of the specific molecules that could be the lucky one on the right side. If we ask how many ways there are to have 2 molecules on the right side, we find 1,999,000 possible arrangements. And when we get bold and consider 3 molecules on the right, with the other 1,997 on the left, we find 1,331,334,000 ways to make it happen.124

It should be clear that these numbers are growing rapidly: 2,000 is a lot bigger than 1, and 1,999,000 is a lot bigger than 2,000, and 1,331,334,000 is bigger still. Eventually, as we imagine moving more and more molecules to the right and emptying out the left, they would begin to go down again; after all, if we ask how many ways we can arrange things so that all 2,000 are on the right and zero are on the left, we’re back to only one unique way.

The situation corresponding to the largest number of different possible arrangements is, unsurprisingly, when things are exactly balanced: 1,000 molecules on the left and 1,000 molecules on the right. In that case, there are—well, a really big number of ways to make that happen. We won’t write out the whole thing, but it’s approximately 2 × 10600 different ways; a 2 followed by 600 zeroes. And that’s with only 2,000 total particles. Imagine the number of possible arrangements of atoms we could find in a real roomful of air or even a glass of water. (Objects you can hold in your hand typically have about 6 × 1023 molecules in them—Avogadro’s Number.) The age of the universe is only about 4 × 1017 seconds, so you are welcome to contemplate how quickly you would have to move molecules back and forth before you explored every possible allowed combination.

This is all very suggestive. There are relatively few ways for all of the molecules to be hanging out on the same side of the box, while there are very many ways for them to be distributed more or less equally—and we expect that a highly uneven distribution will evolve easily into a fairly even one, but not vice versa. But these statements are not quite the same. Boltzmann’s next step was to suggest that, if we didn’t know any better, we should expect systems to evolve from “special” configurations into “generic” ones—that is, from situations corresponding to a relatively small number of arrangements of the underlying particles, toward arrangements corresponding to a larger number of such arrangements.

Boltzmann’s goal in thinking this way was to provide a basis in atomic theory for the Second Law of Thermodynamics, the statement that the entropy will always increase (or stay constant) in a closed system. The Second Law had already been formulated by Clausius and others, but Boltzmann wanted to derive it from some simple set of underlying principles. You can see how this statistical thinking leads us in the right direction—“systems tend to evolve from uncommon arrangements into common ones” bears a family resemblance to “systems tend to evolve from lo w-entropy configurations into high-entropy ones.”

So we’re tempted to define “entropy” as “the number of ways we can rearrange the microscopic components of a system that will leave it macroscopically unchanged.” In our divided-box example, that would correspond to the number of ways we could rearrange individual molecules that would leave the total number on each side unchanged.

That’s almost right, but not quite. The pioneers of thermodynamics actually knew more about entropy than simply “it tends to go up.” For example, they knew that if you took two different systems and put them into contact next to each other, the total entropy would simply be the sum of the individual entropies of the two systems. Entropy is additive, just like the number of particles (but not, for example, like the temperature). But the number of rearrangements is certainly not additive; if you combine two boxes of gas, the number of ways you can rearrange the molecules between the two boxes is enormously larger than the number of ways you can rearrange them withineach box.

Boltzmann was able to crack the puzzle of how to define entropy in terms of microscopic rearrangements. We use the letter W—from the German Wahrscheinlichkeit , meaning “probability” or “likelihood”—to represent the number of ways we can rearrange the microscopic constituents of a system without changing its macroscopic appearance. Boltzmann’s final step was to take the logarithm of W and proclaim that the result is proportional to the entropy.

The word logarithm sounds very highbrow, but it’s just a way to express how many digits it takes to express a number. If the number is a power of 10, its logarithm is just that power.125 So the logarithm of 10 is 1, the logarithm of 100 is 2, the logarithm of 1,000,000 is 6, and so on.

In the Appendix, I discuss some of the mathematical niceties in more detail. But those niceties aren’t crucial to the bigger picture; if you just glide quickly past any appearance of the word logarithm, you won’t be missing much. You only really need to know two things:

• As numbers get bigger, their logarithms get bigger.

• But not very fast. The logarithm of a number grows slowly as the number itself gets bigger and bigger. One billion is much greater than 1,000, but 9 (the logarithm of 1 billion) is not much greater than 3 (the logarithm of 1,000).

That last bit is a huge help, of course, when it comes to the gigantic numbers we are dealing with in this game. The number of ways to distribute 2,000 particles equally between two halves of a box is 2 × 10600, which is an unimaginably enormous quantity. But the logarithm of that number is just 600.3, which is relatively manageable.

Boltzmann’s formula for the entropy, which is traditionally denoted by S (you wouldn’t have wanted to call it E, which usually stands for energy), states that it is equal to some constant k, cleverly called “Boltzmann’s constant,” times the logarithm of W, the number of microscopic arrangements of a system that are macroscopically indistinguishable.126 That is:

S = k log W.

This is, without a doubt, one of the most important equations in all of science—a triumph of nineteenth-century physics, on a par with Newton’s codification of dynamics in the seventeenth century or the revolutions of relativity and quantum mechanics in the twentieth. If you visit Boltzmann’s grave in Vienna, you will find this equation engraved on his tombstone (see Chapter Two).127

Taking the logarithm does the trick, and Boltzmann’s formula leads to just the properties we think something called “entropy” should have—in particular, when you combine two systems, the total entropy is just the sum of the two entropies you started with. This deceptively simple equation provides a quantitative connection between the microscopic world of atoms and the macroscopic world we observe.128


As an example, we can calculate the entropy of the box of gas with a small hole in a divider that we illustrated in Figure 42. Our macroscopic observable is simply the total number of molecules on the left side or the right side. (We don’t know which particular molecules they are, nor do we know their precise coordinates and momenta.) The quantity W in this example is just the number of ways we could distribute the 2,000 total particles without changing the numbers on the left and right. If there are 2,000 particles on the left, W equals 1, and log W equals 0. Some of the other possibilities are listed in Table 1.


Table 1: The number of arrangements W, and the logarithm of that number, corresponding to a divided box of 2,000 particles with some on the left side and some on the right side.

In Figure 43 we see how the entropy, as defined by Boltzmann, changes in our box of gas. I’ve scaled things so that the maximum possible entropy of the box is equal to 1. It starts out relatively low, corresponding to the first configuration in Figure 42, where 1,600 molecules were on the left and only 400 on the right. As molecules gradually slip through the hole in the central divider, the entropy tends to increase. This is one particular example of the evolution; because our “law of physics” (each particle has a 0.5 percent chance of switching sides every second) involved probabilities, the details of any particular example will be slightly different. But it is overwhelmingly likely that the entropy will increase, as the system tends to wander into macroscopic configurations that correspond to larger numbers of microscopic arrangements. The Second Law of Thermodynamics in action.

So this is the origin of the arrow of time, according to Boltzmann and his friends. We start with a set of microscopic laws of physics that are time-reversal invariant: They don’t distinguish between past and future. But we deal with systems featuring large numbers of particles, where we don’t keep track of every detail necessary to fully specify the state of the system; instead, we keep track of some observable macroscopic features. The entropy characterizes (by which we mean, “is proportional to the logarithm of”) the number of microscopic states that are macroscopically indistinguishable. Under the reasonable assumption that the system will tend to evolve toward the macroscopic configurations that correspond to a large number of possible states, it’s natural that entropy will increase with time. In particular, it would be very surprising if it spontaneously decreased. The arrow of time arises because the system (or the universe) naturally evolves from rare configurations into more common configurations as time goes by.


Figure 43: The evolution of the entropy of a divided box of gas. The gas starts with most molecules on the left, and the distribution evens out in time, as we saw in Figure 42. The entropy correspondingly rises, as there are more ways for the molecules to be distributed evenly than to be mostly on one side or the other. For convenience we have plotted the entropy in terms of the maximum entropy, so the maximum value attainable on this plot is 1.

All of this seems superficially plausible and will turn out to be basically true. But along the way we made some “reasonable” leaps of logic, which deserve more careful examination. For the rest of this chapter we will bring to light the various assumptions that go into Boltzmann’s way of thinking about entropy, and try to decide just how plausible they are.


One interesting feature of this box-of-gas example is that the arrow of time is only temporary. After the gas has had a chance to even itself out (at around time 150 in Figure 43), nothing much happens anymore. Individual molecules will continue to bounce between the right and left sides of the box, but these will tend to average out, and the system will spend almost all of its time with approximately equal numbers of molecules on each side. Those are the kinds of configurations that correspond to the largest number of rearrangements of the individual molecules, and correspondingly have the highest entropy the system can possibly have.

A system that has the maximum entropy it can have is in equilibrium. Once there, the system basically has nowhere else to go; it’s in the kind of configuration that is most natural for it to be in. Such a system has no arrow of time, as the entropy is not increasing (or decreasing). To a macroscopic observer, a system in equilibrium appears static, not changing at all.

Richard Feynman, in The Character of Physical Law, tells a story that illustrates the concept of equilibrium.129 Imagine you are sitting on a beach when you are suddenly hit with a tremendous downpour of rain. You’ve brought along a towel, but that also gets wet as you dash to cover. Once you’ve reached some cover, you start to dry yourself with your towel. It works for a little while because the towel is a bit drier than you are, but soon you find that the towel has gotten so wet that rubbing yourself with it is keeping you wet just as fast as it’s making you dry. You and the towel have reached “wetness equilibrium,” and it can’t make you any drier. Your situation maximizes the number of ways the water molecules can arrange themselves on you and the towel.130

Once you’ve reached equilibrium, the towel is no longer useful for its intended purpose (drying you off). Note that the total amount of water doesn’t change as you dry yourself off; it is simply transferred from you to the towel. Similarly, the total energy doesn’t change in a box of gas that is isolated from the rest of the world; energy is conserved, at least in circumstances where we can neglect the expansion of space. But energy can be arranged in more or less useful ways. When energy is arranged in a low-entropy configuration, it can be harnessed to perform useful work, like propelling a vehicle. But the same amount of energy, when it’s in an equilibrium configuration, is completely useless, just like a towel that is in wetness equilibrium with you. Entropy measures the uselessness of a configuration of energy.131

Consider our divided box once again. But instead of the divider being a fixed wall with a hole in it, passively allowing molecules to move back and forth, imagine that the divider is movable, and hooked up to a shaft that reaches outside the box. What we’ve constructed is simply a piston, which can be used to do work under the right circumstances.

In Figure 44 we’ve depicted two different situations for our piston. The top row shows a piston in the presence of a low-entropy configuration of some gas—all the molecules on one side of the divider—while the bottom row shows a high-entropy configuration—equal amounts of gas on both sides. The total number of molecules, and the total amount of energy, is assumed to be the same in both cases; the only difference is the entropy. But it’s clear that what happens in the two cases is very different. In the top row, the gas is all on the left side of the piston, and the force of the molecules bumping into it exerts pressure that pushes the piston to the right until the gas fills the container. The moving piston shaft can be used to do useful work—run a flywheel or some such thing, at least for a little while. That extracts energy from the gas; at the end of the process, the gas will have a lower temperature. (The pistons in your car engine operate in exactly this way, expanding and cooling the hot vapor created by igniting vaporized gasoline, performing the useful work of moving your car.)


Figure 44: Gas in a divided box, used to drive a cylinder. On the top, gas in a low-entropy state pushes the cylinder to the right, doing useful work. On the bottom, gas in a high-entropy state doesn’t push the cylinder in either direction.

On the bottom row in the figure, meanwhile, we imagine starting with the same amount of energy in the gas but in an initial state with a much higher entropy—an equal number of particles on each side of the divider. High entropy implies equilibrium, which implies that the energy is useless, and indeed we see that our piston isn’t going anywhere. The pressure from gas on one side of the divider is exactly canceled by pressure coming from the other side. The gas in this box has the same total energy as the gas in the upper left box, but in this case we can’t harness that energy to make the piston move to do something useful.

This helps us understand the relationship between Boltzmann’s viewpoint on entropy and that of Rudolf Clausius, who first formulated the Second Law. Remember that Clausius and his predecessors didn’t think of entropy in terms of atoms at all; they thought of it as an autonomous substance with its own dynamics. Clausius’s original version of the Second Law didn’t even mention entropy; it was the simple statement that “heat never flows spontaneously from a colder object to a hotter one.” If we put two objects with different temperatures into contact with each other, they will both evolve toward a common middle temperature; if we put two objects with the same temperature into contact with each other, they will simply stay that way. (They’re in thermal equilibrium.)

From the point of atoms, this all makes sense. Consider the classic example of two objects at different temperatures in contact with each other: an ice cube in a glass of warm water, discussed at the end of the previous chapter. Both the ice cube and the liquid are made of precisely the same kind of molecules, namely H2O. The only difference is that the ice is at a much lower temperature. Temperature, as we have discussed, measures the average energy of motion in the molecules of a substance. So while the molecules of the liquid water are moving relatively quickly, the molecules in the ice are moving slowly.

But that kind of condition—one set of molecules moving quickly, another moving slowly—isn’t all that different, conceptually, from two sets of molecules confined to different sides of a box. In either case, there is a broad-brush limitation on how we can rearrange things. If we had just a glass of nothing but water at a constant temperature, we could exchange the molecules in one part of the glass with molecules in some other part, and there would be no macroscopic way to tell the difference. But when we have an ice cube, we can’t simply exchange the molecules in the cube for some water molecules elsewhere in the glass—the ice cube would move, and we would certainly notice that even from our everyday macroscopic perspective. The division of the water molecules into “liquid” and “ice” puts a serious constraint on the number of rearrangements we can do, so that configuration has a low entropy. As the temperature between the water molecules that started out as ice equilibrates with that of the rest of the glass, the entropy goes up. Clausius’s rule that temperatures tend to even themselves out, rather than spontaneously flowing from cold to hot, is precisely equivalent to the statement that the entropy as defined by Boltzmann never decreases in a closed system.

None of this means that it’s impossible to cool things down, of course. But in everyday life, where most things around us are at similar temperatures, it takes a bit more ingenuity than heating them up. A refrigerator is a more complicated machine than a stove. (Refrigerators work on the same basic principle as the piston in Figure 44, expanding a gas to extract energy and cool it off.) When Grant Achatz, chef of Chicago’s Alinea restaurant, wanted a device that would rapidly freeze food in the same way a frying pan rapidly heats food up, he had to team with culinary technologist Philip Preston to create their own. The result is the “anti-griddle,” a microwave-oven-sized machine with a metallic top that attains a temperature of -34 degrees Celsius. Hot purees and sauces, poured on the anti-griddle, rapidly freeze on the bottom while remaining soft on the top. We have understood the basics of thermodynamics for a long time now, but we’re still inventing new ways to put them to good use.


You’re out one Friday night playing pool with your friends. We’re talking about real-world pool now, not “physicist pool” where we can ignore friction and noise.132 One of your pals has just made an impressive break, and the balls have scattered thoroughly across the table. As they come to a stop and you’re contemplating your next shot, a stranger walks by and exclaims, “Wow! That’s incredible!”

Somewhat confused, you ask what is so incredible about it. “Look at these balls at those exact positions on the table! What are the chances that you’d be able to put all the balls in precisely those spots? You’d never be able to repeat that in a million years!”

The mysterious stranger is a bit crazy—probably driven slightly mad by reading too many philosophical tracts on the foundations of statistical mechanics. But she does have a point. With several balls on the table, any particular configuration of them is extremely unlikely. Think of it this way: If you hit the cue ball into a bunch of randomly placed balls, which rattled around before coming to rest in a perfect arrangement as if they had just been racked, you’d be astonished. But that particular arrangement (all balls perfectly arrayed in the starting position) is no more or less unusual than any other precise arrangement of the balls.133 What right do we have to single out certain configurations of the billiard balls as “astonishing” or “unlikely,” while others seem “unremarkable” or “random”?

This example pinpoints a question at the heart of Boltzmann’s definition of entropy and the associated understanding of the Second Law of Thermodynamics: Who decides when two specific microscopic states of a system look the same from our macroscopic point of view?

Boltzmann’s formula for entropy hinges on the idea of the quantity W, which we defined as “the number of ways we can rearrange the microscopic constituents of a system without changing its macroscopic appearance.” In the last chapter we defined the “state” of a physical system to be a complete specification of all the information required to uniquely evolve it in time; in classical mechanics, it would be the position and momentum of every single constituent particle. Now that we are considering statistical mechanics, it’s useful to use the term microstate to refer to the precise state of a system, in contrast with the macrostate, which specifies only those features that are macroscopically observable. Then the shorthand definition of W is “the number of microstates corresponding to a particular macrostate.”

For the box of gas separated in two by a divider, the microstate at any one time is the position and momentum of every single molecule in the box. But all we were keeping track of was how many molecules were on the left, and how many were on the right. Implicitly, every division of the molecules into a certain number on the left and a certain number on the right defined a “macrostate” for the box. And our calculation of W simply counted the number of microstates per macrostate.134

The choice to just keep track of how many molecules were in each half of the box seemed so innocent at the time. But we could imagine keeping track of much more. Indeed, when we deal with the atmosphere in an actual room, we keep track of a lot more than simply how many molecules are on each side of the room. We might, for example, keep track of the temperature, and density, and pressure of the atmosphere at every point, or at least at some finite number of places. If there were more than one kind of gas in the atmosphere, we might separately keep track of the density and so on for each different kind of gas. That’s still enormously less information than the position and momentum of every molecule in the room, but the choice of which information to “keep” as a macroscopically measurable quantity and which information to “forget” as an irrelevant part of the microstate doesn’t seem to be particularly well defined.

The process of dividing up the space of microstates of some particular physical system (gas in a box, a glass of water, the universe) into sets that we label “macroscopically indistinguishable” is known as coarse-graining. It’s a little bit of black magic that plays a crucial role in the way we think about entropy. In Figure 45 we’ve portrayed how coarse-graining works; it simply divides up the space of all states of a system into regions (macrostates) that are indistinguishable by macroscopic observations. Every point within one of those regions corresponds to a different microstate, and the entropy associated with a given microstate is proportional to the logarithm of the area (or really volume, as it’s a very high-dimensional space) of the macrostate to which it belongs. This kind of figure makes it especially clear why entropy tends to go up: Starting from a state with low entropy, corresponding to a very tiny part of the space of states, it’s only to be expected that an ordinary system will tend to evolve to states that are located in one of the large-volume, high-entropy regions.

Figure 45 is not to scale; in a real example, the low-entropy macrostates would be much smaller compared to the high-entropy macrostates. As we saw with the divided-box example, the number of microstates corresponding to high-entropy macrostates is enormously larger than the number associated with low-entropy macrostates. Starting with low entropy, it’s certainly no surprise that a system should wander into the roomier high-entropy parts of the space of states; but starting with high entropy, a typical system can wander for a very long time without ever bumping into a low-entropy condition. That’s what equilibrium is like; it’s not that the microstate is truly static, but that it never leaves the high-entropy macrostate it’s in.


Figure 45: The process of coarse-graining consists of dividing up the space of all possible microstates into regions considered to be macroscopically indistinguishable, which are called macrostates. Each macrostate has an associated entropy, proportional to the logarithm of the volume it takes up in the space of states. The size of the low-entropy regions is exaggerated for clarity; in reality, they are fantastically smaller than the high-entropy regions.

This whole business should strike you as just a little bit funny. Two microstates belong to the same macrostate when they are macroscopically indistinguishable. But that’s just a fancy way of saying, “when we can’t tell the difference between them on the basis of macroscopic observations.” It’s the appearance of “we” in that statement that should make you nervous. Why should our powers of observation be involved in any way at all? We like to think of entropy as a feature of the world, not as a feature of our ability to perceive the world. Two glasses of water are in the same macrostate if they have the same temperature throughout the glass, even if the exact distribution of positions and momenta of the water molecules are different, because we can’t directly measure all of that information. But what if we ran across a race of superobservant aliens who could peer into a glass of liquid and observe the position and momentum of every molecule? Would such a race think that there was no such thing as entropy?

There are several different answers to these questions, none of which is found satisfactory by everyone working in the field of statistical mechanics. (If any of them were, you would need only that one answer.) Let’s look at two of them.

The first answer is, it really doesn’t matter. That is, it might matter a lot to you how you bundle up microstates into macrostates for the purposes of the particular physical situation in front of you, but it ultimately doesn’t matter if all we want to do is argue for the validity of something like the Second Law. From Figure 45, it’s clear why the Second Law should hold: There is a lot more room corresponding to high-entropy states than to low-entropy ones, so if we start in the latter it is natural to wander into the former. But that will hold true no matter how we actually do the coarse-graining. The Second Law is robust; it depends on the definition of entropy as the logarithm of a volume within the space of states, but not on the precise way in which we choose that volume. Nevertheless, in practice we do make certain choices and not others, so this transparent attempt to avoid the issue is not completely satisfying.

The second answer is that the choice of how to coarse-grain is not completely arbitrary and socially constructed, even if some amount of human choice does come into the matter. The fact is, we coarse-grain in ways that seem physically natural, not just chosen at whim. For example, when we keep track of the temperature and pressure in a glass of water, what we’re really doing is throwing away all information that we could measure only by looking through a microscope. We’re looking at average properties within relatively small regions of space because that’s what our senses are actually able to do. Once we choose to do that, we are left with a fairly well-defined set of macroscopically observable quantities.

Averaging within small regions of space isn’t a procedure that we hit upon randomly, nor is it a peculiarity of our human senses as opposed to the senses of a hypothetical alien; it’s a very natural thing, given how the laws of physics work.135 When I look at cups of coffee and distinguish between cases where a teaspoon of milk has just been added and ones where the milk has become thoroughly mixed, I’m not pulling a random coarse-graining of the states of the coffee out of my hat; that’s how the coffee looks to me, immediately and phenomenologically. So even though in principle our choice of how to coarse-grain microstates into macrostates seems absolutely arbitrary, in practice Nature hands us a very sensible way to do it.


A remarkable consequence of Boltzmann’s statistical definition of entropy is that the Second Law is not absolute—it just describes behavior that is overwhelmingly likely. If we start with a medium-entropy macrostate, almost all microstates within it will evolve toward higher entropy in the future, but a small number will actually evolve toward lower entropy.

It’s easy to construct an explicit example. Consider a box of gas, in which the gas molecules all happened to be bunched together in the middle of the box in a lo w-entropy configuration. If we just let it evolve, the molecules will move around, colliding with one another and with the walls of the box, and ending up (with overwhelming probability) in a much higher-entropy configuration.

Now consider a particular microstate of the above box of gas at some moment after it has become high-entropy. From there, construct a new state by keeping all of the molecules at exactly the same positions, but precisely reversing all of the velocities. The resulting state still has a high entropy—it’s contained within the same macrostate as we started with. (If someone suddenly reversed the direction of motion of every single molecule of air around you, you’d never notice; on average there are equal numbers moving in every direction.) Starting in this state, the motion of the molecules will exactly retrace the path that they took from the previous low-entropy state. To an external observer, it will look as if the entropy is spontaneously decreasing. The fraction of high-entropy states that have this peculiar property is astronomically small, but they certainly exist.


Figure 46: On the top row, ordinary evolution of molecules in a box from a low-entropy initial state to a high-entropy final state. At the bottom, we carefully reverse the momentum of every particle in the final state from the top, to obtain a time-reversed evolution in which entropy decreases.

We could even imagine an entire universe that was like that, if we believe that the fundamental laws are reversible. Take our universe today: It is described by some particular microstate, which we don’t know, although we know something about the macrostate to which it belongs. Now simply reverse the momentum of every single particle in the universe and, moreover, do whatever extra transformations (changing particles to antiparticles, for example) are needed to maintain the integrity of time reversal. Then let it go. What we would see would be an evolution toward the “future” in which the universe collapsed, stars and planets unformed, and entropy generally decreased all around; it would just be the history of our actual universe played backward in time.

However—the thought experiment of an entire universe with a reversed arrow of time is much less interesting than that of some subsystem of the universe with a reversed arrow. The reason is simple: Nobody would ever notice.

In Chapter One we asked what it would be like if time passed more quickly or more slowly. The crucial question there was: Compared to what? The idea that “time suddenly moves more quickly for everyone in the world” isn’t operationally meaningful; we measure time by synchronized repetition, and as long as clocks of all sorts (including biological clocks and the clocks defined by subatomic processes) remain properly synchronized, there’s no way you could tell that the “rate of time” was in any way different. It’s only if some particular clock speeds up or slows down compared to other clocks that the concept makes any sense.

Exactly the same problem is attached to the idea of “time running backward.” When we visualize time going backward, we might imagine some part of the universe running in reverse, like an ice cube spontaneously forming out of a cool glass of water. But if the whole thing ran in reverse, it would be precisely the same as it appears now. It would be no different than running the universe forward in time, but choosing some perverse time coordinate that ran in the opposite direction.

The arrow of time isn’t a consequence of the fact that “entropy increases to the future”; it’s a consequence of the fact that “entropy is very different in one direction of time than the other.” If there were some other part of the universe, which didn’t interact with us in any way, where entropy decreased toward what we now call the future, the people living in that reversed-time world wouldn’t notice anything out of the ordinary. They would experience an ordinary arrow of time and claim that entropy was lower in their past (the time of which they have memories) and grew to the future. The difference is that what they mean by “the future” is what we call “the past,” and vice versa. The direction of the time coordinate on the universe is completely arbitrary, set by convention; it has no external meaning. The convention we happen to prefer is that “time” increases in the direction that entropy increases. The important thing is that entropy increases in the same temporal direction for everyone within the observable universe, so that they can agree on the direction of the arrow of time.

Of course, everything changes if two people (or other subsets of the physical universe) who can actually communicate and interact with each other disagree on the direction of the arrow of time. Is it possible for my arrow of time to point in a different direction than yours?


We opened Chapter Two with a few examples of incompatible arrows of time in literature—stories featuring some person or thing that seemed to experience time backward. The homunculus narrator of Time’s Arrow remembered the future but not the past; the White Queen experienced pain just before she pricked her finger; and the protagonist of F. Scott Fitzgerald’s “The Curious Case of Benjamin Button” grew physically younger as time passed, although his memories and experiences accumulated in the normal way. We now have the tools to explain why none of those things happen in the real world.

As long as the fundamental laws of physics are perfectly reversible, given the precise state of the entire universe (or any closed system) at any one moment in time, we can use those laws to determine what the state will be at any future time, or what it was at any past time. We usually take that time to be the “initial” time, but in principle we could choose any moment—and in the present context, when we’re worried about arrows of time pointing in different directions, there is no time that is initial for everything. So what we want to ask is: Why is it difficult/ impossible to choose a state of the universe with the property that, as we evolve it forward in time, some parts of it have increasing entropy and some parts have decreasing entropy?

At first it would seem simple enough. Take two boxes of gas molecules. Prepare one of them in some low-entropy state, as in the top left of Figure 46; once the molecules are let go, their entropy will go up as expected. Prepare the other box by taking a high-entropy state that has just evolved from a low-entropy state, and reversing all of the velocities, as at the bottom left. That second box is delicately constructed so that the entropy will decrease with time. So, starting from that initial condition in both boxes, we will see the entropy evolve in opposite directions.

But we want more than that. It’s not very interesting to have two completely separate systems with oppositely directed arrows of time. We would like to have systems that interact—one system can somehow communicate with the other.

And that ruins everything.136 Imagine we started with these two boxes, one of which had an entropy that was ready to go up and the other ready to go down. But then we introduced a tiny interaction that connected the boxes—say, a few photons moving between the boxes, bouncing off a molecule in one before returning to the other. Certainly the interaction of Benjamin Button’s body with the rest of the world is much stronger than that. (Likewise the White Queen, or Martin Amis’s narrator in Time’s Arrow.)

That extra little interaction will slightly alter the velocities of the molecules with which it interacts. (Momentum is conserved, so it has no choice.) That’s no problem for the box that starts with low entropy, as there is no delicate tuning required to make the entropy go up. But it completely ruins our attempt to set up conditions in the other box so that entropy goes down. Just a tiny change in velocity will quickly propagate through the gas, as one affected molecule hits another molecule, and then they hit two more, and so on. It was necessary for all of the velocities to be very precisely aligned to make the gas miraculously conspire to decrease its entropy, and any interaction we might want to introduce will destroy the required conspiracy. The entropy in the first box will very sensibly go up, while the entropy in the other will just stay high; that subsystem will basically stay in equilibrium. You can’t have incompatible arrows of time among interacting subsystems of the universe.137


We often say that entropy measures disorder. That’s a shorthand translation of a very specific concept into somewhat sloppy language—perfectly adequate as a quick gloss, but there are ways in which it can occasionally go wrong. Now that we know the real definition of entropy given by Boltzmann, we can understand how close this informal idea comes to the truth.

The question is, what do you mean by “order”? That’s not a concept that can easily be made rigorous, as we have done with entropy. In our minds, we associate “order” with a condition of purposeful arrangement, as opposed to a state of randomness. That certainly bears a family resemblance to the way we’ve been talking about entropy. An egg that has not yet been broken seems more orderly than one that we have split apart and whisked into a smooth consistency.

Entropy seems naturally to be associated with disorder because, more often than not, there are more ways to be disordered than to be ordered. A classic example of the growth of entropy is the distribution of papers on your desk. You can put them into neat piles—orderly, low entropy—and over time they will tend to get scattered across the desktop—disorderly, high entropy. Your desk is not a closed system, but the basic idea is on the right track.

But if we push too hard on the association, it doesn’t quite hold up. Consider the air molecules in the room you’re sitting in right now—presumably spread evenly throughout the room in a high-entropy configuration. Now imagine those molecules were instead collected into a small region in the center of the room, just a few centimeters across, taking on the shape of a miniature replica of the Statue of Liberty. That would be, unsurprisingly, much lower entropy—and we would all agree that it also seemed to be more orderly. But now imagine that all the gas in the room was collected into an extremely tiny region, only 1 millimeter across, in the shape of an amorphous blob. Because the region of space covered by the gas is even smaller now, the entropy of that configuration is lower than in the Statue of Liberty example. (There are more ways to rearrange the molecules within a medium-sized statuette than there are within a very tiny blob.) But it’s hard to argue that an amorphous blob is more “orderly” than a replica of a famous monument, even if the blob is really small. So in this case the correlation between orderliness and low entropy seems to break down, and we need to be more careful.

That example seems a bit contrived, but we actually don’t have to work that hard to see the relationship between entropy and disorder break down. In keeping with our preference for kitchen-based examples, consider oil and vinegar. If you shake oil and vinegar together to put on a salad, you may have noticed that they tend to spontaneously unmix themselves if you set the mixture down and leave it to its own devices. This is not some sort of spooky violation of the Second Law of Thermodynamics. Vinegar is made mostly of water, and water molecules tend to stick to oil molecules—and, due to the chemical properties of oil and water, they stick in very particular configurations. So when oil and water (or vinegar) are thoroughly mixed, the water molecules cling to the oil molecules in specific arrangements, corresponding to a relatively low-entropy state. Whereas, when the two substances are largely segregated, the individual molecules can move freely among the other molecules of similar type. At room temperature, it turns out that oil and water have a higher entropy in the unmixed state than in the mixed state.138Order appears spontaneously at the macroscopic level, but it’s ultimately a matter of disorder at the microscopic level.

Things are also subtle for really big systems. Instead of the gas in a room, consider an astronomical-sized cloud of gas and dust—say, an interstellar nebula. That seems pretty disorderly and high-entropy. But if the nebula is big enough, it will contract under its own gravity and eventually form stars, perhaps with planets orbiting around them. Because such a process obeys the Second Law, we can be sure that the entropy goes up along the way (as long as we keep careful track of all the radiation emitted during the collapse and so forth). But a star with several orbiting planets seems, at least informally, to be more orderly than a dispersed interstellar cloud of gas. The entropy went up, but so did the amount of order, apparently.

The culprit in this case is gravity. We’re going to have a lot to say about how gravity wreaks havoc with our everyday notions of entropy, but for now suffice it to say that the interaction of gravity with other forces seems to be able to create order while still making the entropy go up—temporarily, anyway. That is a deep clue to something important about how the universe works; sadly, we aren’t yet sure what that clue is telling us.

For the time being, let’s recognize that the association of entropy with disorder is imperfect. It’s not bad—it’s okay to explain entropy informally by invoking messy desktops. But what entropy really is telling us is how many microstates are macroscopically indistinguishable. Sometimes that has a simple relationship with orderliness, sometimes not.


There are a couple of other nagging worries about Boltzmann’s approach to the Second Law that we should clean up, or at least bring out into the open. We have this large set of microstates, which we divide up into macrostates, and declare that the entropy is the logarithm of the number of microstates per macrostate. Then we are asked to swallow another considerable bite: The proposition that each microstate within a macrostate is “equally likely.”

Following Boltzmann’s lead, we want to argue that the reason why entropy tends to increase is simply that there are more ways to be high-entropy than to be low-entropy, just by counting microstates. But that wouldn’t matter the least bit if a typical system spent a lot more time in the relatively few low-entropy microstates than it did in the many high-entropy ones. Imagine if the microscopic laws of physics had the property that almost all high-entropy microstates tended to naturally evolve toward a small number of low-entropy states. In that case, the fact that there were more high-entropy states wouldn’t make any difference; we would still expect to find the system in a low-entropy state if we waited long enough.

It’s not hard to imagine weird laws of physics that behave in exactly this way. Consider the billiard balls once again, moving around according to perfectly normal billiard-ball behavior, with one crucial exception: Every time a ball bumps into a particular one of the walls of the table, it sticks there, coming immediately to rest. (We’re not imagining that someone has put glue on the rail or any such thing that could ultimately be traced to reversible behavior at the microscopic level, but contemplating an entirely new law of fundamental physics.) Note that the space of states for these billiard balls is exactly what it would be under the usual rules: Once we specify the position and momentum of every ball, we can precisely predict the future evolution. It’s just that the future evolution, with overwhelming probability, ends up with all of the balls stuck on one wall of the table. That’s a very low-entropy configuration; there aren’t many microstates like that. In such a world, entropy would spontaneously decrease even for the closed system of the pool table.

It should be clear what’s going on in this concocted example: The new law of physics is not reversible. It’s much like checkerboard D from the last chapter, where diagonal lines of gray squares would run into a particular vertical column and simply come to an end. Knowing the positions and momenta of all the balls on this funky table is sufficient to predict the future, but it is not good enough to reconstruct the past. If a ball is stuck to the wall, we have no idea how long it has been there.

The real laws of physics seem to be reversible at a fundamental level. This is, if we think about it a bit, enough to guarantee that high-entropy states don’t evolve preferentially into low-entropy states. Remember that reversibility is based on conservation of information: The information required to specify the state at one time is preserved as it evolves through time. That means that two different states now will always evolve into two different states some given amount of time in the future; if they evolved into the same state, we wouldn’t be able to reconstruct the past of that state. So it’s just impossible that high-entropy states all evolve preferentially into low-entropy states, because there aren’t enough low-entropy states to allow it to happen. This is a technical result called Liouville’s Theorem, after French mathematician Joseph Liouville.

That’s almost what we want, but not quite. And what we want (as so often in life) is not something we can really get. Let’s say that we have some system, and we know what macrostate it is in, and we would like to say something about what will happen next. It might be a glass of water with an ice cube floating in it. Liouville’s Theorem says that most microstates in that macrostate will have to increase in entropy or stay the same, just as the Second Law would imply—the ice cube is likely to melt. But the system is in some particular microstate, even if we don’t know which one. How can we be sure that the microstate isn’t one of the very tiny number that is going to dramatically decrease in entropy any minute now? How can we guarantee that the ice cube isn’t actually going to grow a bit, while the water around it heats up?

The answer is: We can’t. There is bound to be some particular microstate, very rare in the ice-cube-and-water macrostate we are considering, that actually evolves toward an even lower-entropy microstate. Statistical mechanics, the version of thermodynamics based on atoms, is essentially probabilistic—we don’t know for sure what is going to happen; we can only argue that certain outcomes are overwhelmingly likely. At least, that’s what we’d like to be able to argue. What we can honestly argue is that most medium-entropy states evolve into higher-entropy states rather than lower-entropy ones. But you’ll notice a subtle difference between “most microstates within this macrostate evolve to higher entropy” and “a microstate within this macrostate is likely to evolve to higher entropy.” The first statement is just about counting the relative number of microstates with different properties (“ice cube melts” vs. “ice cube grows”), but the second statement is a claim about the probability of something happening in the real world. Those are not quite the same thing. There are more Chinese people in the world than there are Lithuanians; but that doesn’t mean that you are more likely to run into a Chinese person than a Lithuanian, if you just happen to be walking down the streets of Vilnius.

Conventional statistical mechanics, in other words, makes a crucial assumption: Given that we know we are in a certain macrostate, and that we understand the complete set of microstates corresponding to that macrostate, we can assume that all such microstates are equally likely. We can’t avoid invoking some assumption along these lines; otherwise there’s no way of making the leap from counting states to assigning probabilities. The equal-likelihood assumption has a name that makes it sound like a dating strategy for people who prefer to play hard to get: the “Principle of Indifference.” It was championed in the context of probability theory, long before statistical mechanics even came on the scene, by our friend Pierre-Simon Laplace. He was a die-hard determinist, but understood as well as anyone that we usually don’t have access to all possible facts, and wanted to understand what we can say in situations of incomplete knowledge.

And the Principle of Indifference is basically the best we can do. When all we know is that a system is in a certain macrostate, we assume that every microstate within that macrostate is equally likely. (With one profound exception—the Past Hypothesis—to be discussed at the end of this chapter.) It would be nice if we could prove that this assumption should be true, and people have tried to do that. For example, if a system were to evolve through every possible microstate (or at least, through a set of microstates that came very close to every possible microstate) in a reasonable period of time, and we didn’t know where it was in that evolution, there would be some justification for treating all microstates as equally likely. A system that wanders all over the space of states and covers every possibility (or close to it) is known as “ergodic.” The problem is, even if a system is ergodic (and not all systems are), it would take forever to actually evolve close to every possible state. Or, if not forever, at least a horrifically long time. There are just too many states for a macroscopic system to sample them all in a time less than the age of the universe.

The real reason we use the Principle of Indifference is that we don’t know any better. And, of course, because it seems to work.


We’ve been pretty definitive about what we mean by “entropy” and “the arrow of time.” Entropy counts the number of macroscopically indistinguishable states, and the arrow of time arises because entropy increases uniformly throughout the observable universe. The real world being what it is, however, other people often use these words to mean slightly different things.

The definition of entropy we have been working with—the one engraved on Boltzmann’s tombstone—associates a specific amount of entropy with each individual microstate. A crucial part of the definition is that we first decide on what counts as “macroscopically measurable” features of the state, and then use those to coarse-grain the entire space of states into a set of macrostates. To calculate the entropy of a microstate, we count the total number of microstates that are macroscopically indistinguishable from it, then take the logarithm.

But notice something interesting: As a state evolving through time moves from a low-entropy condition to a high-entropy condition, if we choose to forget everything other than the macrostate to which it belongs, we end up knowing less and less about which state we actually have in mind. In other words, if we are told that a system belongs to a certain macrostate, the probability that it is any particular microstate within that macrostate decreases as the entropy increases, just because there are more possible microstates it could be. Our information about the state—how accurately we have pinpointed which microstate it is—goes down as the entropy goes up.

This suggests a somewhat different way of defining entropy in the first place, a way that is most closely associated with Josiah Willard Gibbs. (Boltzmann actually investigated similar definitions, but it’s convenient for us to associate this approach with Gibbs, since Boltzmann already has his.) Instead of thinking of entropy as something that characterizes individual states—namely, the number of other states that look macroscopically similar—we could choose to think of entropy as characterizing what we know about the state. In the Boltzmann way of thinking about entropy, the knowledge of which macrostate we are in tells us less and less about the microstate as entropy increases; the Gibbs approach inverts this perspective and defines entropy in terms of how much we know. Instead of starting with a coarse-graining on the space of states, we start with a probability distribution: the percentage chance, for each possible microstate, that the system is actually in that microstate right now. Then Gibbs gives us a formula, analogous to Boltzmann’s, for calculating the entropy associated with that probability distribution.139Coarse-graining never comes into the game.

Neither the Boltzmann formula nor the Gibbs formula for entropy is the “right” one. They both are things you can choose to define, and manipulate, and use to help understand the world; each comes with its advantages and disadvantages. The Gibbs formula is often used in applications, for one very down-to-Earth reason: It’s easy to calculate with. Because there is no coarse-graining, there is no discontinuous jump in entropy when a system goes from one macrostate to another; that’s a considerable benefit when solving equations.

But the Gibbs approach also has two very noticeable disadvantages. One is epistemic: It associates the idea of “entropy” with our knowledge of the system, rather than with the system itself. This has caused all kinds of mischief among the community of people who try to think carefully about what entropy really means. Arguments go back and forth, but the approach I have taken in this book, which treats entropy as a feature of the state rather than a feature of our knowledge, seems to avoid most of the troublesome issues.

The other disadvantage is more striking: If you know the laws of physics and use them to study how the Gibbs entropy evolves with time, you find that it never changes. A bit of reflection convinces us that this must be true. The Gibbs entropy characterizes how well we know what the state is. But under the influence of reversible laws, that’s a quantity that doesn’t change—information isn’t created or destroyed. For the entropy to go up, we would have to know less about the state in the future than we know about it now; but we can always run the evolution backward to see where it came from, so that can’t happen. To derive something like the Second Law from the Gibbs approach, you have to “forget” something about the evolution. When you get right down to it, that’s philosophically equivalent to the coarse-graining we had to do in the Boltzmann approach; we’ve just moved the “forgetting” step to the equations of motion, rather than the space of states.

Nevertheless, there’s no question that the Gibbs formula for entropy is extremely useful in certain applications, and people are going to continue to take advantage of it. And that’s not the end of it; there are several other ways of thinking about entropy, and new ones are frequently being proposed in the literature. There’s nothing wrong with that; after all, Boltzmann and Gibbs were proposing definitions to supercede Clausius’s perfectly good definition of entropy, which is still used today under the rubric of “thermodynamic” entropy. After quantum mechanics came on the scene, John von Neumann proposed a formula for entropy that is specifically adapted to the quantum context. As we’ll discuss in the next chapter, Claude Shannon suggested a definition of entropy that was very similar in spirit to Gibbs’s, but in the framework of information theory rather than physics. The point is not to find the one true definition of entropy; it’s to come up with concepts that serve useful functions in the appropriate contexts. Just don’t let anyone bamboozle you by pretending that one definition or the other is the uniquely correct meaning of entropy.

Just as there are many definitions of entropy, there are many different “arrows of time,” another source of potential bamboozlement. We’ve been dealing with the thermodynamic arrow of time, the one defined by entropy and the Second Law. There is also the cosmological arrow of time (the universe is expanding), the psychological arrow of time (we remember the past and not the future), the radiation arrow of time (electromagnetic waves flow away from moving charges, not toward them), and so on. These different arrows fall into different categories. Some, like the cosmological arrow, reflect facts about the evolution of the universe but are nevertheless completely reversible. It might end up being true that the ultimate explanation for the thermodynamic arrow also explains the cosmological arrow (in fact it seems quite plausible), but the expansion of the universe doesn’t present any puzzle with respect to the microscopic laws of physics in the same way the increase of entropy does. Meanwhile, the arrows that reflect true irreversibilities—the psychological arrow, radiation arrow, and even the arrow defined by quantum mechanics we will investigate later—all seem to be reflections of the same underlying state of affairs, characterized by the evolution of entropy. Working out the details of how they are all related is undeniably important and interesting, but I will continue to speak of “the” arrow of time as the one defined by the growth of entropy.


Once Boltzmann had understood entropy as a measure of how many microstates fit into a given macrostate, his next goal was to derive the Second Law of Thermodynamics from that perspective. I’ve already given the basic reasons why the Second Law works—there are more ways to be high-entropy than low-entropy, and distinct starting states evolve into distinct final states, so most of the time (with truly overwhelming probability) we would expect entropy to go up. But Boltzmann was a good scientist and wanted to do better than that; he wanted to prove that the Second Law followed from his formulation.

It’s hard to put ourselves in the shoes of a late-nineteenth-century thermody namicist. Those folks felt that the inability of entropy to decrease in a closed system was not just a good idea; it was a Law. The idea that entropy would “probably” increase wasn’t any more palatable than a suggestion that energy would “probably” be conserved would have been. In reality, the numbers are just so overwhelming that the probabilistic reasoning of statistical mechanics might as well be absolute, for all intents and purposes. But Boltzmann wanted to prove something more definite than that.

In 1872, Boltzmann (twenty-eight years old at the time) published a paper in which he purported to use kinetic theory to prove that entropy would always increase or remain constant—a result called the “H-Theorem,” which has been the subject of countless debates ever since. Even today, some people think that the H-Theorem explains why the Second Law holds in the real world, while others think of it as an amusing relic of intellectual history. The truth is that it’s an interesting result for statistical mechanics but falls short of “proving” the Second Law.

Boltzmann reasoned as follows. In a macroscopic object such as a room full of gas or a cup of coffee with milk, there are a tremendous number of molecules—more than 1024. He considered the special case where the gas is relatively dilute, so that two particles might bump into each other, but we can ignore those rare events when three or more particles bump into one another at the same time. (That really is an unobjectionable assumption.) We need some way of characterizing the macrostate of all these particles. So instead of keeping track of the position and momentum of every molecule (which would be the whole microstate), let’s keep track of the average number of particles that have any particular position and momentum. In a box of gas in equilibrium at a certain temperature, for example, the average number of particles is equal at every position in the box, and there will be a certain distribution of momenta, so that the average energy per particle gives the right temperature. Given just that information, you can calculate the entropy of the gas. And then you could prove (if you were Boltzmann) that the entropy of a gas that is not in equilibrium will go up as time goes by, until it reaches its maximum value, and then it will just stay there. The Second Law has, apparently, been derived.140

But there is clearly something fishy going on. We started with microscopic laws of physics that are perfectly time-reversal invariant—they work equally well running forward or backward in time. And then Boltzmann claimed to derive a result from them that is manifestly not time-reversal invariant—one that demonstrates a clear arrow of time, by saying that entropy increases toward the future. How can you possibly get irreversible conclusions from reversible assumptions?

This objection was put forcefully in 1876 by Josef Loschmidt, after similar concerns had been expressed by William Thomson (Lord Kelvin) and James Clerk Maxwell. Loschmidt was close friends with Boltzmann and had served as a mentor to the younger physicist in Vienna in the 1860s. And he was no skeptic of atomic theory; in fact Loschmidt was the first scientist to accurately estimate the physical sizes of molecules. But he couldn’t understand how Boltzmann could have derived time asymmetry without sneaking it into his assumptions.

The argument behind what is now known as “Loschmidt’s reversibility objection” is simple. Consider some specific microstate corresponding to a low-entropy macrostate. It will, with overwhelming probability, evolve toward higher entropy. But time-reversal invariance guarantees that for every such evolution, there is another allowed evolution—the time reversal of the original—that starts in the high-entropy state and evolves toward the low-entropy state. In the space of all things that can happen over time, there are precisely as many examples of entropy starting high and decreasing as there are examples of entropy starting low and increasing. In Figure 45, showing the space of states divided up into macrostates, we illustrated a trajectory emerging from a very low-entropy macrostate; but trajectories don’t just pop into existence. That history had to come from somewhere, and that somewhere had to have higher entropy—an explicit example of a path along which entropy decreased. It is manifestly impossible to prove that entropy always increases, if you believe in time-reversal-invariant dynamics (as they all did).141

But Boltzmann had proven something—there were no mathematical or logical errors in his arguments, as far as anyone could tell. It would appear that he must have smuggled in some assumption of time asymmetry, even if it weren’t explicitly stated.

And indeed he had. A crucial step in Boltzmann’s reasoning was the assumption of molecular chaos—in German, the Stosszahlansatz, translated literally as “collision number hypothesis.” It amounts to assuming that there are no sneaky conspiracies in the motions of individual molecules in the gas. But a sneaky conspiracy is precisely what is required for the entropy to decrease! So Boltzmann had effectively proven that entropy could increase only by dismissing the alternative possibilities from the start. In particular, he had assumed that the momenta of every pair of particles were uncorrelated before they collided. But that “before” is an explicitly time-asymmetric step; if the particles really were uncorrelated before a collision, they would generally be correlated afterward. That’s how an irreversible assumption was sneaked into the proof.

If we start a system in a low-entropy state and allow it to evolve to a high-entropy state (let an ice cube melt, for example), there will certainly be a large number of correlations between the molecules in the system once all is said and done. Namely, there will be correlations that guarantee that if we reversed all the momenta, the system would evolve back to its low-entropy beginning state. Boltzmann’s analysis didn’t account for this possibility. He proved that entropy would never decrease, if we neglected those circumstances under which entropy would decrease.


Ultimately, it’s perfectly clear what the resolution to these debates must be, at least within our observable universe. Loschmidt is right in that the set of all possible evolutions has entropy decreasing as often as it is increasing. But Boltzmann is also right, that statistical mechanics explains why low-entropy conditions will evolve into high-entropy conditions with overwhelming probability. The conclusion should be obvious: In addition to the dynamics controlled by the laws of physics, we need to assume that the universe began in a low-entropy state. That is a boundary condition, an extra assumption, not part of the laws of physics themselves. (At least, not until we start talking about what happened before the Big Bang, which is not a discussion one could have had in the 1870s.) Unfortunately, that conclusion didn’t seem sufficient to people at the time, and subsequent years have seen confusions about the status of the H-Theorem proliferate beyond reason.

In 1876, Boltzmann wrote a response to Loschmidt’s reversibility objection, which did not really clarify the situation. Boltzmann certainly understood that Loschmidt had a point, and admitted that there must be something undeniably probabilistic about the Second Law; it couldn’t be absolute, if kinetic theory were true. At the beginning of his paper, he makes this explicit:

Since the entropy would decrease as the system goes through this sequence in reverse, we see that the fact that entropy actually increases in all physical processes in our own world cannot be deduced solely from the nature of the forces acting between the particles, but must be a consequence of the initial conditions.

We can’t ask for a more unambiguous statement than that: “the fact that entropy increases in our own world . . . must be a consequence of the initial conditions.” But then, still clinging to the idea of proving something without relying on initial conditions, he immediately says this:

Nevertheless, we do not have to assume a special type of initial condition in order to give a mechanical proof of the Second Law, if we are willing to accept a statistical viewpoint.

“Accepting a statistical viewpoint” presumably means that he admits we can argue only that increasing entropy is overwhelmingly likely, not that it always happens. But what can he mean by now saying that we don’t have to assume a special type of initial condition? The next sentences confirm our fears:

While any individual non-uniform state (corresponding to low entropy) has the same probability as any individual uniform state (corresponding to high entropy), there are many more uniform states than non-uniform states. Consequently, if the initial state is chosen at random, the system is almost certain to evolve into a uniform state, and entropy is almost certain to increase.

That first sentence is right, but the second is surely wrong. If an initial state is chosen at random, it is not “almost certain to evolve into a uniform state”; rather, it is almost certain to be in a uniform (high-entropy) state. Among the small number of low-entropy states, almost all of them evolve toward higher-entropy states. In contrast, only a very tiny fraction of high-entropy states will evolve toward lo w-entropy states; however, there are a fantastically larger number of high-entropy states to begin with. The total number of low-entropy states that evolve to high entropy is equal, as Loschmidt argued, to the total number of high-entropy states that evolve to low entropy.

Reading through Boltzmann’s papers, one gets a strong impression that he was several steps ahead of everyone else—he saw the ins and outs of all the arguments better than any of his interlocutors. But after zooming through the ins and outs, he didn’t always stop at the right place; moreover, he was notoriously inconsistent about the working assumptions he would adopt from paper to paper. We should cut him some slack, however, since here we are 140 years later and we still don’t agree on the best way of talking about entropy and the Second Law.


Within our observable universe, the consistent increase of entropy and the corresponding arrow of time cannot be derived from the underlying reversible laws of physics alone. They require a boundary condition at the beginning of time. To understand why the Second Law works in our real world, it is not sufficient to simply apply statistical reasoning to the underlying laws of physics; we must also assume that the observable universe began in a state of very low entropy. David Albert has helpfully given this assumption a simple name: the Past Hypothesis.142

The Past Hypothesis is the one profound exception to the Principle of Indifference that we alluded to above. The Principle of Indifference would have us imagine that, once we know a system is in some certain macrostate, we should consider every possible microstate within that macrostate to have an equal probability. This assumption turns out to do a great job of predicting the future on the basis of statistical mechanics. But it would do a terrible job of reconstructing the past, if we really took it seriously.

Boltzmann has told us a compelling story about why entropy increases: There are more ways to be high entropy than low entropy, so most microstates in a low-entropy macrostate will evolve toward higher-entropy macrostates. But that argument makes no reference to the direction of time. Following that logic, most microstates within some macrostate will increase in entropy toward the future but will also have evolved from a higher-entropy condition in the past.

Consider all the microstates in some medium-entropy macrostate. The overwhelming majority of those states have come from prior states of high entropy. They must have, because there aren’t that many low-entropy states from which they could have come. So with high probability, a typical medium-entropy microstate appears as a “statistical fluctuation” from a higher-entropy past. This argument is exactly the same argument that entropy should increase into the future, just with the time direction reversed.

As an example, consider the divided box of gas with 2,000 particles. Starting from a low-entropy condition (80 percent of the particles on one side), the entropy tends to go up, as plotted in Figure 43. But in Figure 47 we show how the entropy evolves to the past as well as to the future. Since the underlying dynamical rule (“each particle has a 0.5 percent chance of changing sides per second”) doesn’t distinguish between directions of time, it’s no surprise that the entropy is higher in the past of that special moment just as it is in the future.

You may object, thinking that it’s very unlikely that a system would start out in equilibrium and then dive down to a low-entropy state. That’s certainly true; it would be much more likely to remain at or near equilibrium. But given that we insist on having a low-entropy state at all, it is overwhelmingly likely that such a state represents a minimum on the entropy curve, with higher entropy both to the past and to the future.


Figure 47: The entropy of a divided box of gas. The “boundary” condition is set at time = 500, where 80 percent of the particles are on one side and 20 percent on the other (a low-entropy macrostate). Entropy increases both to the future and to the past of that moment.

At least, it would be overwhelmingly likely, if all we had to go on were the Principle of Indifference. The problem is, no one in the world thinks that the entropy of the real universe behaves as shown in Figure 47. Everyone agrees that the entropy will be higher tomorrow than it is today, but nobody thinks it was higher yesterday than it is today. There are good reasons for that agreement, as we’ll discuss in the next chapter—if we currently live at a minimum of the entropy curve, all of our memories of the past are completely unreliable, and we have no way of making any kind of sense of the universe.

So if we care about what actually happens in the world, we have to supplement the Principle of Indifference with the Past Hypothesis. When it comes to picking out microstates within our macrostate, we do not assign every one equal probability: We choose only those microstates that are compatible with a much lower-entropy past (a very tiny fraction), and take all of those to have equal probability.143

But this strategy leaves us with a question: Why is the Past Hypothesis true? In Boltzmann’s time, we didn’t know anything about general relativity or the Big Bang, much less quantum mechanics or quantum gravity. But the question remains with us, only in a more specific form: Why did the universe have a low entropy near the Big Bang?