From Eternity to Here: The Quest for the Ultimate Theory of Time - Sean Carroll (2010)



You should call it entropy, for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one knows what entropy really is, so in a debate you will always have the advantage.

—John von Neumann, to Claude Shannon144

In a celebrated episode in Swann’s Way, Marcel Proust’s narrator is feeling cold and somewhat depressed. His mother offers him tea, which he reluctantly accepts. He is then pulled into an involuntary recollection of his childhood by the taste of a traditional French teatime cake, the madeleine.

And suddenly the memory appeared. That taste was the taste of the little piece of madeleine which on Sunday mornings at Combray . . . when I went to say good morning to her in her bedroom, my aunt Léonie would give me after dipping it in her infusion of tea or lime blossom . . . And as soon as I had recognized the taste of the piece of madeleine dipped in lime-blossom tea that my aunt used to give me . . . immediately the old gray house on the street, where her bedroom was, came like a stage set to attach itself to the little wing opening onto a garden that had been built for my parents behind it . . . ; and with the house the town, from morning to night and in all weathers, the Square, where they sent me before lunch, the streets where I went on errands, the paths we took if the weather was fine.145

Swann’s Way is the first of the seven volumes of À la recherche du temps perdu, which translates into English as In Search of Lost Time. But C. K. Scott Moncrieff, the original translator, borrowed a line from Shakespeare’s Sonnet 30 to render Proust’s novel as Remembrance of Things Past.

The past, of course, is a natural thing to have remembrances of. What else would we be remembering, anyway? Surely not the future. Of all the ways in which the arrow of time manifests itself, memory—and in particular, the fact that it applies to the past but not the future—is the most obvious, and the most central to our lives. Perhaps the most important difference between our experience of one moment and our experience of the next is the accumulation of memories, propelling us forward in time.

My stance so far has been that all the important ways in which the past differs from the future can be traced to a single underlying principle, the Second Law of Thermodynamics. This implies that our ability to remember the past but not the future must ultimately be explained in terms of entropy, and in particular by recourse to the Past Hypothesis that the early universe was in a very low-entropy state. Examining how that works will launch us on an exploration of the relationship between entropy, information, and life.


One of the problems in talking about “memory” is that there’s a lot we don’t understand about how the human brain actually works, not to mention the phenomenon of consciousness.146 For our present purposes, however, that’s not a significant handicap. When we talk about remembering the past, we’re interested not specifically in the human experience of memory, but in the general notion of reconstructing past events from the present state of the world. We don’t lose anything by considering well-understood mechanical recording devices, or even such straightforward artifacts as photographs or history books. (We are making an implicit assumption that human beings are part of the natural world, and in particular that our minds can in principle be understood in terms of our brains, which obey the laws of physics.)

So let’s imagine you have in your possession something you think of as a reliable record of the past: for example, a photograph taken of your tenth birthday party. You might say to yourself, “I can be confident that I was wearing a red shirt at my tenth birthday party, because this photograph of that event shows me wearing a red shirt.” Put aside any worries that you might have over whether the photo has been tampered with or otherwise altered. The question is, what right do we have to conclude something about the past from the existence of this photo in the present?

In particular, let’s imagine that we did not buy into this Past Hypothesis business. All we have is some information about the current macrostate of the universe, including the fact that it has this particular photo, and we have certain memories and so on. We certainly don’t know the current microstate—we don’t know the position and momentum of every particle in the world—but we can invoke the Principle of Indifference to assign equal probability to every microstate compatible with the macrostate. And, of course, we know the laws of physics—maybe not the complete Theory of Everything, but enough to give us a firm handle on our everyday world. Are those—the present macrostate including the photo, plus the Principle of Indifference, plus the laws of physics—enough to conclude with confidence that we really were wearing a red shirt at our tenth birthday party?

Not even close. We tend to think that they are, without really worrying about the details too much as we get through our lives. Roughly speaking, we figure that a photograph like that is a highly specific arrangement of its constituent molecules. (Likewise for a memory in our brain of the same event.) It’s not as if those molecules are just going to randomly assemble themselves into the form of that particular photo—that’s astronomically unlikely. If, however, there really was an event in the past corresponding to the image portrayed in the photo, and someone was there with a camera, then the existence of the photo becomes relatively likely. It’s therefore very reasonable to conclude that the birthday party really did happen in the way seen in the photo.

All of those statements are reasonable, but the problem is that they are not nearly enough to justify the final conclusion. The reason is simple, and precisely analogous to our discussion of the box of gas at the end of the last chapter. Yes, the photograph is a very specific and unlikely arrangement of molecules. However, the story we are telling to “explain” it—an elaborate reconstruction of the past, involving birthday parties and cameras and photographs surviving essentially undisturbed to the present day—is even less likely than the photo all by itself. At least, if “likely” is judged by assuming that all possible microstates consistent with our current macrostate have an equal probability—which is precisely what we assumed.

Think of it this way: You would never think to appeal to some elaborate story in the future in order to explain the existence of a particular artifact in the present. If we ask about the future of our birthday photo, we might have some plans to frame it or whatnot, but we’ll have to admit to a great deal of uncertainty—we could lose it, it could fall into a puddle and decay, or it could burn in a fire. Those are all perfectly plausible extrapolations of the present state into the future, even with the specific anchor point provided by the photo here in the present. So why are we so confident about what the photo implies concerning the past?


Figure 48: Trajectories through (part of) state space, consistent with our present macrostate. We can reconstruct the past accurately only by assuming a Past Hypothesis, in addition to knowledge of our current macrostate.

The answer, of course, is the Past Hypothesis. We don’t really apply the Principle of Indifference to the current macrostate of the world—we only consider those microstates that are compatible with a very low-entropy past. And that makes all the difference when drawing inferences about the meaning of photographs or memories or other sorts of records. If we ask, “What is the most likely way, in the space of all possible evolutions of the universe, to get this particular photograph?” the answer is that it is most likely to evolve as a random fluctuation from a higher-entropy past—by exactly the same arguments that convince us it is likely to evolve toward a high-entropy future. But if instead we ask, “What is the most likely way, in the space of all evolutions of the universe from a very low-entropy beginning, to get this particular photograph?” then we find very naturally that it is most likely to go through the intermediate steps of an actual birthday party, a red shirt, a camera, and all the rest. Figure 48 illustrates the general principle—by demanding that our history stretch from a low-entropy beginning to here, we dramatically restrict the space of allowed trajectories, leaving us with those for which our records are (for the most part) reliable reflections of the past.


I know from experience that not everyone is convinced by this argument. One stumbling block is the crucial assertion that what we start with is knowledge of our present macrostate, including some small-scale details about a photograph or a history book or a memory lurking in our brains. Although it seems like a fairly innocent assumption, we have an intuitive feeling that we don’t know something only about the present; we know something about the past, because we see it, in a way that we don’t see the future. Cosmology is a good example, just because the speed of light plays an important role, and we have a palpable sense of “looking at an event in the past.” When we try to reconstruct the history of the universe, it’s tempting to look at (for example) the cosmic microwave background and say, “I can see what the universe was like almost 14 billion years ago; I don’t have to appeal to any fancy Past Hypothesis to reason my way into drawing any conclusions.”

That’s not right. When we look at the cosmic microwave background (or light from any other distant source, or a photograph of any purported past event), we’re not looking at the past. We’re observing what certain photons are doing right here and now. When we scan our radio telescope across the sky and observe a bath of radiation at about 2.7 Kelvin that is very close to uniform in every direction, we’ve learned something about the radiation passing through our present location, which we then need to extrapolate backward to infer something about the past. It’s conceivable that this uniform radiation came from a past that was actually highly non-uniform, but from which a set of finely tuned conspiracies between temperatures and Doppler shifts and gravitational effects produced a very smooth-looking set of photons arriving at us today. You may say that’s very unlikely, but the time-reverse of that is exactly what we would expect if we took a typical microstate within our present macrostate and evolved it toward a Big Crunch. The truth is, we don’t have any more direct empirical access to the past than we have to the future, unless we allow ourselves to assume a Past Hypothesis.

Indeed, the Past Hypothesis is more than just “allowed”; it’s completely necessary, if we hope to tell a sensible story about the universe. Imagine that we simply refused to invoke such an idea and stuck solely with the data given to us by our current macrostate, including the state of our brains and our photographs and our history books. We would then predict with strong probability that the past as well as the future was a higher-entropy state, and that all of the low-entropy features of our present condition arose as random fluctuations. That sounds bad enough, but the reality is worse. Under such circumstances, among the things that randomly fluctuated into existence are all of the pieces of information we traditionally use to justify our understanding of the laws of physics, or for that matter all of the mental states (or written-down arguments) we traditionally use to justify mathematics and logic and the scientific method. Such assumptions, in other words, give us absolutely no reason to believe we have justified anything, including those assumptions themselves.

David Albert has referred to such a conundrum as cognitive instability—the condition we face when a set of assumptions undermines the reasons we might have used to justify those very assumptions.147 It is a kind of helplessness that can’t be escaped without reaching beyond the present moment. Without the Past Hypothesis, we simply can’t tell any intelligible story about the world; so we seem to be stuck with it, or stuck with trying to find a theory that actually explains it.


There is a dramatic temporal asymmetry in this story of how we use memories and records: We invoke a Past Hypothesis but not a future one. In making predictions, we do not throw away any microstates consistent with our current macrostate on the grounds that they are incompatible with any particular future boundary condition. What if we did? In Chapter Fifteen we will examine the Gold cosmology, in which the universe eventually stops expanding and begins to re-collapse, while the arrow of time reverses itself and entropy begins to decrease as we approach the Big Crunch. In that case there would be no overall difference between the collapsing phase and the expanding phase we find ourselves in today—they are identical (at least statistically). Observers who lived in the collapsing phase wouldn’t think anything was funny about their universe, any more than we do; they would think that we were evolving backward in time.

It’s more illuminating to consider the ramifications of a minor restriction on allowed trajectories into our nearby future. This is essentially the situation we would face if we had a reliable prophecy of future events. When Harry Potter learns that either he will kill Voldemort or Voldemort will kill him, that places a very tight restriction on the allowed space of states.148

Craig Callender tells a vivid story about what a future boundary condition would be like. Imagine that an oracle with an impeccable track record (much better than Professor Trelawney from the Harry Potter books) tells you that all of the world’s Imperial Fabergé eggs will end up in your dresser drawer, and that when they get there your life will end. Not such a believable prospect, really—you’re not even especially fond of Russian antiques, and now you know better than to let any into your bedroom. But somehow, through a series of unpredictable and unlikely fluke occurrences, those eggs keep finding a way into your drawer. You lock it, but the lock jiggles open; you inform the eggs’ owners to keep them where they are, but thieves and random accidents conspire to gradually collect them all in your room. You get a package that was mistakenly delivered to your address—it was supposed to go to the museum—and you open it to find an egg inside. In a panic, you throw it out the window, but the egg bounces off a street lamp at a crazy angle and careens back into your room to land precisely in your dresser drawer. And then you have a heart attack and die.149

Throughout this chain of events, no laws of physics are broken along the way. At every step, events occur that are not impossible, just extremely unlikely. As a result, our conventional notions of cause and effect are overturned. We operate in our daily lives with a deep-seated conviction that causes precede effects: “There is a broken egg on the floor because I just dropped it,” not “I just dropped that egg because there was going to be a broken egg on the floor.” In the social sciences, where the causal relationship between different features of the social world can be hard to ascertain, this intuitive feeling has been elevated to the status of a principle. When two properties are highly correlated with each other, it’s not always obvious which is the cause and which is the effect, or whether both are caused by a different effect altogether. If you find that people who are happier in their marriages tend to eat more ice cream, is that because ice cream improves marriage, or happiness leads to more ice-cream eating? But there is one case where you know for sure: When one of the properties comes before the other one in time. Your grandparents’ level of educational attainment may affect your adult income, but your income doesn’t change your grandparents’ education.150

Future boundary conditions overturn this understanding of cause and effect by insisting that some specific otherwise-unlikely things are necessarily going to happen. The same holds for the idea of free will. Ultimately, our ability to “choose” how to act in the future is a reflection of our ignorance concerning the specific microstate of the universe; if Laplace’s Demon were around, he would know exactly how we are going to act. A future boundary condition is a form of predestination.

All of which may seem kind of academic and not worth dwelling on, for the basic reason that we don’t think there is any kind of future boundary condition that restricts our current microstate, and therefore we believe that causes precede effects. But we have no trouble believing in a past condition that restricts our current microstate. The microscopic laws of physics draw no distinction between past and future, and the idea that one event “causes” another or that we can “choose” different actions in the future in a way that we can’t in the past is nowhere to be found therein. The Past Hypothesis is necessary to make sense of the world around us, but it has a lot to answer for.


Let’s shift gears a bit to return to the thought-experiment playground of nineteenth-century kinetic theory. Ultimately this will lead us to the connection between entropy and information, which will circle back to illuminate the question of memory.

Perhaps the most famous thought experiment in all of thermodynamics is Maxwell’s Demon. James Clerk Maxwell proposed his Demon—more famous than Laplace’s, and equally menacing in its own way—in 1867, when the atomic hypothesis was just beginning to be applied to problems of thermodynamics. Boltzmann’s first work on the subject wasn’t until the 1870s, so Maxwell didn’t have recourse to the definition of entropy in the context of kinetic theory. But he did know about Clausius’s formulation of the Second Law: When two systems are in contact, heat will tend to flow from the hotter to the cooler, bringing both temperatures closer to equilibrium. And Maxwell knew enough about atoms to understand that “temperature” measures the average kinetic energy of the atoms. But with his Demon, he seemed to come up with a way to increase the difference in temperature between two systems, without injecting any energy—in apparent violation of the Second Law.

The setup is simple: the same kind of box of gas divided into two sides that we’re very familiar with by now. But instead of a small opening that randomly lets molecules pass back and forth, there’s a small opening with a very tiny door—one that can be opened and closed without exerting a noticeable amount of energy. At the door sits a Demon, who monitors all of the molecules on either side of the box. If a fast-moving molecule approaches from the right, the Demon lets it through to the left side of the box; if a slow-moving molecule approaches from the left, the Demon lets it through to the right. But if a slow-moving molecule approaches from the right, or a fast-moving one from the left, the Demon shuts the door so they stay on the side they’re on.

It’s clear what will happen: Gradually, and without any energy being exerted, the high-energy molecules will accumulate on the left, and the low-energy ones on the right. If the temperatures on both sides of the box started out equal, they will gradually diverge—the left will get hotter, and the right will get cooler. But that’s in direct violation of Clausius’s formulation of the Second Law. What’s going on?

If we started in a high-entropy state, with the gas at equal temperature throughout the box, and we evolve reliably (for any beginning state, not just some finely tuned ones) into a lower-entropy state, we’ve gone from a situation where a large number of initial states all evolve into a small number of final states. But that simply can’t happen, if the dynamical laws are information conserving and reversible. There’s no room for all of those initial states to be squeezed into the smaller number of final states. So clearly there has to be a compensating increase in entropy somewhere, if the entropy in the gas goes down. And there’s only one place that entropy could go: into the Demon.


Figure 49: By letting high-energy molecules move from the right half of the box to the left, and slow-moving molecules move from the left to the right, Maxwell’s Demon lets heat flow from a cold system to a hotter one, in apparent violation of the Second Law.

The question is, how does that work? It doesn’t look like the Demon increased in entropy; at the start of the experiment it’s sitting there peacefully, waiting for the right molecules to come along, and at the end of the experiment it’s still sitting there, just as peacefully. The embarrassing fact is that it took a long time—more than a century—for scientists to really figure out the right way to think about this problem. Hungarian-American physicist Leó Szilárd and French physicist Léon Brillouin—both of whom were pioneers in applying the new science of quantum mechanics to problems of practical interest—helped pinpoint the crucial relationship between the information gathered by the Demon and its entropy. But it wasn’t until the contributions of two different physicist/computer scientists who worked for IBM, Rolf Landauer in 1961 and Charles Bennett in 1982, that it finally became clear why exactly the Demon’s entropy must always increase in accordance with the Second Law.151


Many attempts to understand Maxwell’s Demon concentrated on the means by which it measured the velocities of the molecules zooming around its vicinity. One of the big conceptual leaps of Landauer and Bennett was to focus on the means by which the Demon recorded that information. After all, the Demon has to remember—even if just for a microsecond—which molecules to let by, and which to keep on their original sides. Indeed, if the Demon simply knew from the start which molecules had which velocities, it wouldn’t have to do any measurements at all; so the crux of the problem can’t be in the measurement process.

So we have to equip the Demon with some way to record the velocities of all the molecules—perhaps it carries around a notepad, which for convenience we can imagine has just enough room to record all of the relevant information. (Nothing changes if we consider larger or smaller pads, as long as the pad is not infinitely big.) That means that the state of the notepad must be included when we calculate the entropy of the combined gas/Demon system. In particular, the notepad must start out blank, in order to be ready to record the velocities of the molecules.

But a blank notepad is, of course, nothing other than a low-entropy past boundary condition. It’s just the Maxwell’s Demon version of the Past Hypothesis, sneaked in under another guise. If that’s the case, the entropy of the combined gas/Demon system clearly wasn’t as high as it could have been. The Demon doesn’t lower the entropy of the combined system; it simply transfers the entropy from the state of the gas to the state of the notepad.

You might be suspicious of this argument. After all, you might think, can’t the Demon just erase the notepad when all is said and done? And wouldn’t that return the notepad to its original state, while the gas went down in entropy?

This is the crucial insight of Landauer and Bennett: No, you can’t just erase the notepad. At least, you can’t erase information if you are part of a closed system operating under reversible dynamical laws. When phrased that way, the result is pretty believable: If you were able to erase the information entirely, how would you ever be able to reverse the evolution to its previous state? If erasure is possible, either the fundamental laws are irreversible—in which case it’s not at all surprising that the Demon can lower the entropy—or you’re not really in a closed system. The act of erasing information necessarily transfers entropy to the outside world. (In the case of real-world erasing of actual pencil markings, this entropy comes mostly in the form of heat, dust, and tiny flecks of rubber.)

So you have two choices. Either the Demon starts with a blank low-entropy notepad, in a demonic version of the Past Hypothesis, and simply transfers entropy from the gas to the notepad; or the Demon needs to erase information from the notepad, in which case entropy gets transferred to the outside world. In either case, the Second Law is safe. But along the way, we’ve opened the door to the fascinating connection between information and entropy.


Even though we’ve tossed around the word information a lot in discussing dynamical laws of physics—reversible laws conserve information—the concept still seems a bit abstract compared to the messy world of energy and heat and entropy. One of the lessons of Maxwell’s Demon is that this is an illusion: Information is physical. More concretely, possessing information allows us to extract useful work from a system in ways that would have otherwise been impossible.

Leó Szilárd showed this explicitly in a simplified model of Maxwell’s Demon. Imagine that our box of gas contained just a single molecule; the “temperature” would just be the energy of that one gas molecule. If that’s all we know, there’s no way to use that molecule to do useful work; the molecule just rattles around like a pebble in a can. But now imagine that we have a single bit of information: whether the molecule is on the left side of the box or the right. With that, plus some clever thought-experiment-level manipulation, we can use the molecule to do work. All we have to do is quickly insert a piston into the other half of the box. The molecule will bump into it, pushing the piston, and we can use the external motion to do something useful, like turn a flywheel.152

Note the crucial role played by information in Szilárd’s setup. If we didn’t know which half of the box the molecule was in, we wouldn’t know where to insert the piston. If we inserted it randomly, half the time it would be pushed out and half the time it would be pulled in; on average, we wouldn’t be getting any useful work at all. The information in our possession allowed us to extract energy from what appeared to be a system at maximal entropy.

To be clear: In the final analysis, none of these thought experiments are letting us violate the Second Law. Rather, they provide ways that we could appear to violate the Second Law, if we didn’t properly account for the crucial role played by information. The information collected and processed by the Demon must somehow be accounted for in any consistent story of entropy.

The concrete relationship between entropy and information was developed in the 1940s by Claude Shannon, an engineer/mathematician working for Bell Labs.153 Shannon was interested in finding efficient and reliable ways of sending signals across noisy channels. He had the idea that some messages carry more effective information than others, simply because the message is more “surprising” or unexpected. If I tell you that the Sun is going to rise in the East tomorrow morning, I’m not actually conveying much information, because you already expected that was going to happen. But if I tell you the peak temperature tomorrow is going to be exactly 25 degrees Celsius, my message contains more information, because without the message you wouldn’t have known precisely what temperature to expect.

Shannon figured out how to formalize this intuitive idea of the effective information content of a message. Imagine that we consider the set of all possible messages we could receive of a certain type. (This should remind you of the “space of states” we considered when talking about physical systems rather than messages.) For example, if we are being told the outcome of a coin flip, there are only two possible messages: “heads” or “tails.” Before we get the message, either alternative is equally likely; after we get the message, we have learned precisely one bit of information.

If, on the other hand, we are being told what the high temperature will be tomorrow, there are a large number of possible messages: say, any integer between -273 and plus infinity, representing the temperature in degrees Celsius. (Minus 273 degrees Celsius is absolute zero.) But not all of those are equally likely. If it’s summer in Los Angeles, temperatures of 27 or 28 degrees Celsius are fairly common, while temperatures of -13 or +4,324 degrees Celsius are comparatively rare. Learning that the temperature tomorrow would be one of those unlikely numbers would convey a great deal of information indeed (presumably related to some global catastrophe).

Roughly speaking, then, the information content of a message goes up as the probability of a given message taking that form goes down. But Shannon wanted to be a little bit more precise than that. In particular, he wanted it to be the case that if we receive two messages that are completely independent of each other, the total information we get is equal to the sum of the information contained in each individual message. (Recall that, when Boltzmann was inventing his entropy formula, one of the properties he wanted to reproduce was that the entropy of a combined system was the sum of the entropies of the individual systems.) After some playing around, Shannon figured out that the right thing to do was to take the logarithm of the probability of receiving a given message. His final result is this: The “self-information” contained in a message is equal to minus the logarithm of the probability that the message would take that particular form.

If many of these words sound familiar, it’s not an accident. Boltzmann associated the entropy with the logarithm of the number of microstates in a certain macrostate. But given the Principle of Indifference, the number of microstates in a macrostate is clearly proportional to the probability of picking one of them randomly in the entire space of states. A low-entropy state is like a surprising, i nformation-filled message, while knowing that you’re in a high-entropy state doesn’t tell you much at all. When all is said and done, if we think of the “message” as a specification of which macrostate a system is in, the relationship between entropy and information is very simple: The information is the difference between the maximum possible entropy and the actual entropy of the macrostate.154


It should come as no surprise that these ideas connecting entropy and information come into play when we start thinking about the relationship between thermodynamics and life. Not that this relationship is very straightforward; although there certainly is a close connection, scientists haven’t even yet agreed on what “life” really means, much less understood all its workings. This is an active research area, one that has seen an upsurge in recent interest, drawing together insights from biology, physics, chemistry, mathematics, computer science, and complexity studies.155

Without yet addressing the question of how “life” should be defined, we can ask what sounds like a subsequent question: Does life make thermodynamic sense? The answer, before you get too excited, is “yes.” But the opposite has been claimed—not by any respectable scientists, but by creationists looking to discredit Darwinian natural selection as the correct explanation for the evolution of life on Earth. One of their arguments relies on a misunderstanding of the Second Law, which they read as “entropy always increases,” and then interpret as a universal tendency toward decay and disorder in all natural processes. Whatever life is, it’s pretty clear that life is complicated and orderly—how, then, can it be reconciled with the natural tendency toward disorder?

There is, of course, no contradiction whatsoever. The creationist argument would equally well imply that refrigerators are impossible, so it’s clearly not correct. The Second Law doesn’t say that entropy always increases. It says that entropy always increases (or stays constant) in a closed system, one that doesn’t interact noticeably with the external world. It’s pretty obvious that life is not like that; living organisms interact very strongly with the external world. They are the quintessential examples of open systems. And that is pretty much that; we can wash our hands of the issue and get on with our lives.

But there’s a more sophisticated version of the creationist argument, which is not quite as silly—although it’s still wrong—and it’s illuminating to see exactly how it fails. The more sophisticated argument is quantitative: Sure, living beings are open systems, so in principle they can decrease entropy somewhere as long as it increases somewhere else. But how do you know that the increase in entropy in the outside world is really enough to account for the low entropy of living beings?

As I mentioned back in Chapter Two, the Earth and its biosphere are systems that are very far away from thermal equilibrium. In equilibrium, the temperature is the same everywhere, whereas when we look up we see a very hot Sun in an otherwise very cold sky. There is plenty of room for entropy to increase, and that’s exactly what’s happening. But it’s instructive to run the numbers.156


Figure 50: We receive energy from the Sun in a concentrated, low-entropy form, and radiate it back to the universe in a diffuse, high-entropy form. For every 1 high-energy photon we receive, the Earth radiates about 20 low-energy photons.

The energy budget of the Earth, considered as a single system, is pretty simple. We get energy from the Sun via radiation; we lose the same amount of energy to empty space, also via radiation. (Not exactly the same; processes such as nuclear decays also heat up the Earth and leak energy into space, and the rate at which energy is radiated is not strictly constant. Still, it’s an excellent approximation.) But while the amount is the same, there is a big difference in the quality of the energy we get and the energy we give back. Remember back in the pre-Boltzmann days, entropy was understood as a measurement of the uselessness of a certain amount of energy; low-entropy forms of energy could be put to useful work, such as powering an engine or grinding flour, while high-entropy forms of energy just sat there.

The energy we get from the Sun is of a low-entropy, useful form, while the energy we radiate back out into space has a much higher entropy. The temperature of the Sun is about 20 times the average temperature of the Earth. For radiation, the temperature is just the average energy of the photons of which it is made, so the Earth needs to radiate 20 low-energy (long-wavelength, infrared) photons for every 1 high-energy (short-wavelength, visible) photon it receives. It turns out, after a bit of math, that 20 times as many photons directly translates into 20 times the entropy. The Earth emits the same amount of energy as it receives, but with 20 times higher entropy.

The hard part is figuring out just what we mean when we say that the life forms here on Earth are “low-entropy.” How exactly do we do the coarse-graining? It is possible to come up with reasonable answers to that question, but it’s complicated. Fortunately, there is a dramatic shortcut we can take. Consider the entire biomass of the Earth—all of the molecules that are found in living organisms of any type. We can easily calculate the maximum entropy that collection of molecules could have, if it were in thermal equilibrium; plugging in the numbers (the biomass is 1015 kilograms; the temperature of the Earth is 255 Kelvin), we find that its maximum entropy is 1044. And we can compare that to the minimum entropy it could possibly have—if it were in an exactly unique state, the entropy would be precisely zero.

So the largest conceivable change in entropy that would be required to take a completely disordered collection of molecules the size of our biomass and turn them into absolutely any configuration at all—including the actual ecosystem we currently have—is 1044. If the evolution of life is consistent with the Second Law, it must be the case that the Earth has generated more entropy over the course of life’s evolution by converting high-energy photons into low-energy ones than it has decreased entropy by creating life. The number 1044 is certainly an overly generous estimate—we don’t have to generate nearly that much entropy, but if we can generate that much, the Second Law is in good shape.

How long does it take to generate that much entropy by converting useful solar energy into useless radiated heat? The answer, once again plugging in the temperature of the Sun and so forth, is: about 1 year. Every year, if we were really efficient, we could take an undifferentiated mass as large as the entire biosphere and arrange it in a configuration with as small an entropy as we can imagine. In reality, life has evolved over billions of years, and the total entropy of the “Sun + Earth (including life) + escaping radiation” system has increased by quite a bit. So the Second Law is perfectly consistent with life as we know it—not that you were ever in doubt.


It’s good to know that life doesn’t violate the Second Law of Thermodynamics. But it would also be nice to have a well-grounded understanding of what “life” actually means. Scientists haven’t yet agreed on a single definition, but there are a number of features that are often associated with living organisms: complexity, organization, metabolism, information processing, reproduction, response to stimuli, aging. It’s difficult to formulate a set of criteria that clearly separates living beings—algae, earthworms, house cats—from complex nonliving objects—forest fires, galaxies, personal computers. In the meantime, we are able to analyze some of life’s salient features, without drawing a clear distinction between their appearance in living and nonliving contexts.

One famous attempt to grapple with the concept of life from a physicist’s perspective was the short book What Is Life? written by none other than Erwin Schrödinger. Schrödinger was one of the inventors of quantum theory; it’s his equation that replaces Newton’s laws of motion as the dynamical description of the world when we move from classical mechanics to quantum mechanics. He also originated the Schrödinger’s Cat thought experiment to highlight the differences between our direct perceptions of the world and the formal structure of quantum theory.

After the Nazis came to power, Schrödinger left Germany, but despite winning the Nobel Prize in 1933 he had difficulty in finding a permanent position elsewhere, largely because of his colorful personal life. (His wife Annemarie knew that he had mistresses, and she had lovers of her own; at the time Schrödinger was involved with Hilde March, wife of one of his assistants, who would eventually bear a child with him.) He ultimately settled in Ireland, where he helped establish an Institute for Advanced Studies in Dublin.

In Ireland Schrödinger gave a series of public lectures, which were later published as What Is Life? He was interested in examining the phenomenon of life from the perspective of a physicist, and in particular an expert on quantum mechanics and statistical mechanics. Perhaps the most remarkable thing about the book is Schrödinger’s deduction that the stability of genetic information over time is best explained by positing the existence of some sort of “aperiodic crystal” that stored the information in its chemical structure. This insight helped inspire Francis Crick to leave physics in favor of molecular biology, eventually leading to his discovery with James Watson of the double-helix structure of DNA.157

But Schrödinger also mused on how to define “life.” He made a specific proposal in that direction, which comes across as somewhat casual and offhand, and perhaps hasn’t been taken as seriously as it might have been:

What is the characteristic feature of life? When is a piece of matter said to be alive? When it goes on ‘doing something’, exchanging material with its environment, and so forth, and that for a much longer period than we would expect an inanimate piece of matter to ‘keep going’ under similar circumstances. 158

Admittedly, this is a bit vague; what exactly does it mean to “keep going,” how long should we “expect” it to happen, and what counts as “similar circumstances”? Furthermore, there’s nothing in this definition about organization, complexity, information processing, or any of that.

Nevertheless, Schrödinger’s idea captures something important about what distinguishes life from non-life. In the back of his mind, he was certainly thinking of Clausius’s version of the Second Law: objects in thermal contact evolve toward a common temperature (thermal equilibrium). If we put an ice cube in a glass of warm water, the ice cube melts fairly quickly. Even if the two objects are made of very different substances—say, if we put a plastic “ice cube” in a glass of water—they will still come to the same temperature. More generally, nonliving physical objects tend to wind down and come to rest. A rock may roll down a hill during an avalanche, but before too long it will reach the bottom, dissipate energy through the creation of noise and heat, and come to a complete halt.

Schrödinger’s point is simply that, for living organisms, this process of coming to rest can take much longer, or even be put off indefinitely. Imagine that, instead of an ice cube, we put a goldfish into our glass of water. Unlike the ice cube (whether water or plastic), the goldfish will not simply equilibrate with the water—at least, not within a few minutes or even hours. It will stay alive, doing something, swimming, exchanging material with its environment. If it’s put into a lake or a fish tank where food is available, it will keep going for much longer.

And that, suggests Schrödinger, is the essence of life: staving off the natural tendency toward equilibration with one’s surroundings. At first glance, most of the features we commonly associate with life are nowhere to be found in this definition. But if we start thinking about why organisms are able to keep doing something long after nonliving things would wind down—why the goldfish is still swimming long after the ice cube would have melted—we are immediately drawn to the complexity of the organism and its capacity for processing information. The outward sign of life is the ability of an organism to keep going for a long time, but the mechanism behind that ability is a subtle interplay between numerous levels of hierarchical structure.

We would like to be a little more specific than that. It’s nice to say, “living beings are things that keep going for longer than we would otherwise expect, and the reason they can keep going is because they’re complex,” but surely there is more to the story. Unfortunately, it’s not a simple story, nor one that scientists understand very well. Entropy certainly plays a big role in the nature of life, but there are important aspects that it doesn’t capture. Entropy characterizes individual states at a single moment in time, but the salient features of life involve processes that evolve through time. By itself, the concept of entropy has only very crude implications for evolution through time: It tends to go up or stay the same, not go down. The Second Law says nothing about how fast entropy will increase, or the particular methods by which entropy will grow—it’s all about Being, not Becoming.159

Nevertheless, even without aspiring to answer all possible questions about the meaning of “life,” there is one concept that undoubtedly plays an important role: free energy. Schrödinger glossed over this idea in the first edition of What Is Life?, but in subsequent printings he added a note expressing his regret for not giving it greater prominence. The idea of free energy helps to tie together entropy, the Second Law, Maxwell’s Demon, and the ability of living organisms to keep going longer than nonliving objects.


The field of biological physics has witnessed a dramatic rise in popularity in recent years. That’s undoubtedly a good thing—biology is important, and physics is important, and there are a great number of interesting problems at the interface of the two fields. But it’s also no surprise that the field lay relatively fallow for as long as it did. If you pick up an introductory physics textbook and compare it with a biological physics text, you’ll notice a pronounced shift in vocabulary.160 Conventional introductory physics books are filled with words like force and momentum and conservation, while biophysics books feature words like entropy and information and dissipation.

This difference in terminology reflects an underlying difference in philosophy. Ever since Galileo first encouraged us to ignore air resistance when thinking about how objects fall in a gravitational field, physics has traditionally gone to great lengths to minimize friction, dissipation, noise, and anything else that would detract from the unimpeded manifestation of simple microscopic dynamical laws. In biological physics, we can’t do that; once you start ignoring friction, you ignore life itself. Indeed, that’s an alternative definition worth contemplating: Life is organized friction.

But, you are thinking, that doesn’t sound right at all. Life is all about maintaining structure and organization, whereas friction creates entropy and disorder. In fact, both perspectives capture some of the underlying truth. What life does is to create entropy somewhere, in order to maintain structure and organization somewhere else. That’s the lesson of Maxwell’s Demon.

Let’s examine what that might mean. Back when we first talked about the Second Law in Chapter Two, we introduced the distinction between “useful” and “useless” energy: Useful energy can be converted into some kind of work, while useless energy is useless. One of the contributions of Josiah Willard Gibbs was to formalize these concepts, by introducing the concept of “free energy.” Schrödinger didn’t use that term in his lectures because he worried that the connotations were confusing: The energy isn’t really “free” in the sense that you can get it for nothing; it’s “free” in the sense that it’s available to be used for some purpose.161 (Think “free speech,” not “free beer,” as free-software guru Richard Stallman likes to say.) Gibbs realized that he could use the concept of entropy to cleanly divide the total amount of energy into the useful part, which he called “free,” and the useless part:162

total energy = free energy + useless (high-entropy) energy.

When a physical process creates entropy in a system with a fixed total amount of energy, it uses up free energy; once all the free energy is gone, we’ve reached equilibrium.

That’s one way of thinking about what living organisms do: They maintain order in their local environment (including their own bodies) by taking advantage of free energy, degrading it into useless energy. If we put a goldfish in an otherwise empty container of water, it can maintain its structure (far from equilibrium with its surroundings) for a lot longer than an ice cube can; but eventually it will die from starvation. But if we feed the goldfish, it can last for a much longer time even than that. From a physics point of view, food is simply a supply of free energy, which a living organism can take advantage of to power its metabolism.

From this perspective, Maxwell’s Demon (along with his box of gas) serves as an illuminating paradigm for how life works. Consider a slightly more elaborate version of the Demon story. Let’s take the divided box of gas and embed it in an “environment,” which we model by an arbitrarily large collection of stuff at a constant temperature—what physicists call a “heat bath.” (The point is that the environment is so large that its own temperature won’t be affected by interactions with the smaller system in which we are interested, in this case the box of gas.) Even though the molecules of gas stay inside the walls of the box, thermal energy can pass in and out; therefore, even if the Demon were to segregate the gas effectively into one cool half and one hot half, the temperature would immediately begin to even out through interactions with the surrounding environment.

We imagine that the Demon would really like to keep its particular box far from equilibrium—it wants to do its best to keep the left side of the box at a high temperature and the right side at a low temperature. (Note that we have turned the Demon into a protagonist, rather than a villain.) So it has to do its traditional sorting of molecules according to their velocities, but now it has to keep doing that in perpetuity, or otherwise each side will equilibrate with its environment. By our previous discussion, the Demon can’t do its sorting without affecting the outside world; the process of erasing records will inevitably generate entropy. What the Demon requires, therefore, is a continual supply of free energy. It takes in the free energy (“food”), then takes advantage of that free energy to erase its records, generating entropy in the process and degrading the energy into uselessness; the useless energy is then discarded as heat (or whatever). With its newly erased notepad, the Demon is ready to keep its box of gas happily displaced from equilibrium, at least until it fills the notepad once more, and the cycle repeats itself.


Figure 51: Maxwell’s Demon as a paradigm for life. The Demon maintains order—a separation of temperatures—in the box, against the influence of the environment, by processing information through the transformation of free energy into high-entropy heat.

This charming vignette obviously fails to encapsulate everything we mean by the idea of “life,” but it succeeds in capturing an essential part of the bigger picture. Life strives to maintain order in the face of the demands of the Second Law, whether it’s the actual body of the organism, or its mental state, or the works of Ozymandias. And it does so in a specific way: by degrading free energy in the outside world in the cause of keeping itself far from thermal equilibrium. And that’s an operation, as we have seen, that is tightly connected to the idea of information processing. The Demon carries out its duty by converting free energy into information about the molecules in its box, which it then uses to keep the temperature in the box from evening out. At some very basic level, the purpose of life boils down to survival—the organism wants to preserve the smooth operation of its own complex structure.163 Free energy and information are the keys to making it happen.

From the point of view of natural selection, there are many reasons why a complex, persistent structure might be adaptively favored: An eye, for example, is a complex structure that clearly contributes to the fitness of an organism. But increasingly complex structures require that we turn increasing amounts of free energy into heat, just to keep them intact and functioning. This picture of the interplay of energy and information therefore makes a prediction: The more complex an organism becomes, the more inefficient it will be at using energy for “work” purposes—simple mechanical operations like running and jumping, as opposed to the “upkeep” purposes of keeping the machinery in good working condition. And indeed, that’s true; in real biological organisms, the more complex ones are correspondingly less efficient in their use of energy.164


There are any number of fascinating topics at the interface of entropy, information, life, and the arrow of time that we don’t have a chance to discuss here: aging, evolution, mortality, thinking, consciousness, social structures, and countless more. Confronting all of them would make this a very different book, and our primary goals are elsewhere. But before returning to the relatively solid ground of conventional statistical mechanics, we can close this chapter with one more speculative thought, the kind that may hopefully be illuminated by new research in the near future.

As the universe evolves, entropy increases. That is a very simple relationship: At early times, near the Big Bang, the entropy was very low, and it has grown ever since and will continue to grow into the future. But apart from entropy, we can also characterize (at least roughly) the state of the universe at any one moment in time in terms of its complexity, or by the converse of complexity, its simplicity. And the evolution of complexity with time isn’t nearly that straightforward.

There are a number of different ways we could imagine quantifying the complexity of a physical situation, but there is one measure that has become widely used, known as the Kolmogorov complexity or algorithmic complexity.165This idea formalizes our intuition that a simple situation is easy to describe, while a complex situation is hard to describe. The difficulty we have in describing a situation can be quantified by specifying the shortest possible computer program (in some given programming language) that would generate a description of that situation. The Kolmogorov complexity is just the length of that shortest possible computer program.

Consider two strings of numbers, each a million characters long. One string consists of nothing but 8’s in every digit, while the other is some particular sequence of digits with no discernible pattern within them:


The first of these is simple—it has a low Kolmogorov complexity. That’s because it can be generated by a program that just says, “Print the number 8 a million times.” The second string, however, is complex. Any program that prints it out has to be at least one million characters long, because the only way to describe this string is to literally specify every single digit. This definition becomes helpful when we consider numbers like pi or the square root of two—they look superficially complex, but there is actually a short program in either case that can calculate them to any desired accuracy, so their Kolmogorov complexity is quite low.

The complexity of the early universe is low, because it’s very easy to describe. It was a hot, dense state of particles, very smooth over large scales, expanding at a certain rate, with some (fairly simple to specify) set of tiny perturbations in density from place to place. From a coarse-grained perspective, that’s the entire description of the early universe; there’s nothing else to say. Far in the future, the complexity of the universe will also be very low: It will just be empty space, with an increasingly dilute gruel of individual particles. But in between—like right now—things look extremely complicated. Even after coarse-graining, there is no simple way of expressing the hierarchical structures described by gas, dust, stars, galaxies, and clusters, much less all of the interesting things going on much smaller scales, such as our ecosystem here on Earth.

So while the entropy of the universe increases straightforwardly from low to high as time goes by, the complexity is more interesting: It goes from low, to relatively high, and then back down to low again. And the question is: Why? Or perhaps: What are the ramifications of this form of evolution? There are a whole host of questions we can think to ask. Under what general circumstances does complexity tend to rise and then fall again? Does such behavior inevitably accompany the evolution of entropy from low to high, or are other features of the underlying dynamics necessary? Is the emergence of complexity (or “life”) a generic feature of evolution in the presence of entropy gradients? What is the significance of the fact that our early universe was simple as well as low-entropy? How long can life survive as the universe relaxes into a simple, high-entropy future?166

Science is about answering hard questions, but it’s also about pinpointing the right questions to ask. When it comes to understanding life, we’re not even sure what the right questions are. We have a bunch of intriguing concepts that we’re pretty sure will play some sort of role in an ultimate understanding—entropy, free energy, complexity, information. But we’re not yet able to put them together into a unified picture. That’s okay; science is a journey in which getting there is, without question, much of the fun.