The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray (1996)
In November 1989, Richard Herrnstein and I agreed to collaborate on the book that would become The Bell Curve. It occupied both of our professional lives for the next four and a half years. On June 3, 1994, the day that we submitted the final revisions of the manuscript and the date you will find on the Acknowledgments, we learned that Richard Herrnstein had inoperable cancer. He died on September 13, 1994, after setting an example of courage and serenity that his friends can only hope to emulate in their turn.
The Bell Curve was released by the Free Press in early October 1994. The initial reaction to it was encouraging. Acting on Herrnstein’s suggestion, the American Enterprise Institute (AEI) held a small conference of academics and journalists from various points on the political spectrum. The conference went well, with brisk exchanges about a book on which people had differing opinions but which they discussed over the course of two days as a serious and careful work of scholarship. Two weeks after the conference, Malcolm Browne’s thoughtful review appeared in the New York Times Book Review.1
Then came the avalanche. It seems likely that The Bell Curve will be one of the most written-about and talked-about works of social science since the Kinsey Report fifty years ago. Most of the published reaction was virulently hostile. The book was said to be the flimsiest kind of pseudoscience. A racist screed. Designed to promote a radical political agenda. An angry book. Tainted by the work of neo-Nazis.
“Never,” my AEI colleague Michael Ledeen observed a few months after publication, “has such a moderate book attracted such an immoderate response.” It is my thought, too, as I am sure it would be Richard Herrnstein’s. If there is one objective that we shared from the beginning, it was to write a book that was relentlessly moderate—in its tone, science, and argumentation. You have in your hands the means for deciding whether you agree.
THE BELL CURVE AND POLITICS
If The Bell Curve is in fact a book of mainstream science cautiously interpreted, why did it cause such a stir? The obvious answer is race, the omnipresent backdrop to discussion of social policy in the United States. From the founding of the nation, the issue of race has preyed on the consciences of white Americans, especially the intellectual elite. This is natural and good: White America has had much to feel guilty about. But since the 1960s, for reasons that I do not fully understand, this long-standing discomfort among white elites has taken on the character of a sensitivity so acute that it resembles a disorder. Ever since the first wave of attacks on the book, I have had an image of The Bell Curve as a literary Rorschach test. I do not know how else to explain the extraordinary discrepancy between what The Bell Curve actually says about race and what many commentators have said that the book says, except as the result of some sort of psychological projection onto our text.
The problems that beset America’s intellectuals when they try to think about race is not the only reason for the hysteria, however. The Bell Curve also scraped a political nerve that was far more sensitive than either Richard Herrnstein or I had realized. When we began work on the book, both of us assumed that it would provide evidence that would be more welcome to the political left than to the political right, via this logic: If intelligence plays an important role in determining how well one does in life, and intelligence is conferred on a person through a combination of genetic and environmental factors over which that person has no control (as we argue in the book), the most obvious political implication is that we need a Rawlsian egalitarian state, compensating the less advantaged for the unfair allocation of intellectual gifts. Neither of us thought that the most obvious implication was the right one, for reasons we describe in Chapter 22. But we recognized the burden on us to make the case. And yet The Bell Curve has been widely attacked as a book written to advance a right-wing political agenda.
The reason for the attack arises partly from our known political positions. As those who have read either Losing Ground or In Pursuit2 know, I am on the right (though more libertarian than conservative) in my politics and Richard Herrnstein, although he had few published policy positions that could easily be characterized as liberal or conservative, had been denounced as a conservative because of his earlier writings on the heritability of IQ and its social consequences.3 A thoroughgoing liberal in his youth, Herrnstein had become moderately conservative over the last two decades of his life. We joked between ourselves that he was the Tory and I the Whig.
So we had political opinions. It goes with the territory. Social scientists who are absorbed in policy issues tend also to have opinions about those issues. A few, like us, are somewhere on the right. James Q. Wilson in political science (another Tory) and Richard Epstein in legal studies (another Whig) are prominent examples. More commonly, they are on the left. Christopher Jencks, William Julius Wilson, David Ellwood, Andrew Hacker, Robert Reich, and Robert Haveman, each of whom has published an important book on social policy in the past decade, are all social democrats of one sort or another, as are many of the most vocal critics of The Bell Curve (e.g., Stephen Jay Gould, Howard Gardner, and Leon Kamin). The appropriate question to ask of any of these authors vis-à-vis their books is not whether they have political opinions but whether their presentations of the data are distorted by their politics, and whether they enable readers to understand how their political views enter into their policy conclusions. On both counts, Richard Herrnstein and I intended The Bell Curve to be exemplary. Once again, you have in your hands the means to judge for yourself.
But I now understand, as neither of us fully anticipated during the writing, that this book must create a political firestorm among intellectuals simply by accepting that human beings differ widely in ways that matter to daily life, and that many of these differences, including intelligence, are not readily manipulated by public policy. We knew how heretical this position had been in the 1960s and 1970s (pp. 8-9 of the Introduction), but we underestimated how important it remains in the 1990s. Watching the reaction to the book, theologian Michael Novak and economist Thomas Sowell have written in similar terms how the dominant intellectual stance toward public policy continues to invest everything in a few core beliefs about society as the cause of problems, government as the solution, and the manipulability of human beings in reaching the goal of equality.4 For persons who hold this view, Novak writes, The Bell Curve’s “message cannot be true, because much more is at stake than a particular set of arguments from psychological science. A this-worldly eschatological hope is at stake. The sin attributed to Herrnstein and Murray is theological: they destroy hope.”5
I am sure Novak and Sowell are on the right track, though there is still more to be learned. The underlying reasons for the reaction to The Bell Curve will be significant in their own right when they are fully understood, revealing much about the intellectual temper of our era, but perspective on that reaction must wait for some years. Let me make a more limited prediction: When the Sturm und Drang has subsided, nothing important in The Bell Curve will have been overturned. I say this not because Herrnstein and I were so brilliant or farsighted but because our conclusions were so conservatively phrased and anchored so firmly in the middle of the scientific road. Therein also lies the best way for you to decide what to make of the various commentaries on the book. Take whatever critical review you find to be most persuasive, delete the rhetoric, and identify the bare bones of an assertion: “Here is what Herrnstein and Murray said, and here is why they are wrong.” Then go to the relevant section of the book and put the assertion side by side with what we actually said. You will find the exercise instructive.
THE UNINTENDED OUTCOMES OF THE ATTACKS ON THE BELL CURVE
In the meantime I want to present my own assessment of where the debate stands. The problem is how to do it within a reasonable space and how to avoid being overtaken by events. The first wave of reviews and commentaries in the major media appeared between October 1994 and January 1995. The second wave, consisting of reviews in the academic journals, is on the way. As I write, I have already seen manuscript copies of some of these reviews, often highly technical, that will be published over the next year. The volume of this material reaches many hundreds of pages. To comment in detail on them would require another book. I will use this Afterword instead to present a general proposition about The Bell Curve and to illustrate it with examples.
Much of the attack on The Bell Curve has a purpose that occasionally has been stated explicitly, but more often tacitly: somehow, to put the genie back in the bottle, quelling discussion of topics that the book brought into the open. This is a much more ambitious objective than merely disagreeing with our presentation and accounts (I hypothesize) for the attacks on our motives and character. It is not enough that The Bell Curve be refuted. It must be discredited altogether.
The trouble with this strategy is that it will backfire. My proposition is that the critics of The Bell Curve are going to produce the very effects that their attacks have been intended to avert. I foresee a three-stage process.
In the first stage, a critic approaches The Bell Curve absolutely certain that it is wrong. He feels no need to be judicious or to explore our evidence in good faith. He seizes on the arguments that come to hand to make his point and publishes them, with the invective and dismissiveness that seem to be obligatory for a Bell Curve critic.
In the second stage, the attack draws other scholars to look at the issue. Many of them share the critic’s initial assumption that The Bell Curve is wrong, but they nonetheless start to examine evidence they would not have looked at otherwise and discover that the data are interesting. Some of them back off nervously, but others are curious, and they look further. And it turns out not just that The Bell Curve’s initial arguments were right but that there is much more out there than Herrnstein and I try to claim.
In stage three, these scholars start to write new material on the topics that had come under attack in the first place. I doubt that many will choose to defend The Bell Curve, but they will build on its foundations and ultimately do far more damage to the critics’ “eschatological hope” than The Bell Curve itself did.
I will give four examples of these unintended outcomes, drawing from the attacks on the “pseudoscience” of a general intelligence factor, on the link between genes and race differences in IQ, on the power of the statistical evidence, and on our pessimistic assessment of society’s current attempts to raise IQ through outside interventions.
The “Pseudoscience” of a General Intelligence Factor
One main line of attack on The Bell Curve’s science has been mounted not against anything in the book itself but against the psychometric tradition on which it is based. Specifically, Herrnstein and I accept that there is such a thing as a general factor of cognitive ability on which human beings differ: the famous g.
Ever since the late 1960s, when IQ became a pariah in the world of ideas, this has been a politically incorrect position to take. In the early 1980s, two books cemented the discrediting of g: Stephen Jay Gould’s The Mismeasure of Man and Howard Gardner’s Frames of Mind: The Theory of Multiple Intelligences.6 Gould called the concept of g a fraud, and Gardner identified seven distinct and independent kinds of intelligence. Both of these views were swallowed uncritically and enthusiastically by the elite media, as documented by Mark Snyderman and Stanley Rothman in The IQ Controversy: The Media and Public Policy.7 From the time we began working on The Bell Curve,no putative refutations of our project were brought up nearly as often or as confidently as these two books when we talked with friends and colleagues who were not psychologists.
Among scholars who work in the field of intelligence, Gould and Gardner have different reputations. Many psychometricians enjoy Gardner’s work. He stretches the exploration of intelligence into new disciplines and keeps people from ignoring all the many ways in which humans exhibit special talents that fall outside the classical conception of intelligence. But to accept these virtues in Gardner’s work is not to say that he has demonstrated the existence of “multiple intelligences” of remotely equivalent value in today’s world. If you want to predict someone’s success in life, you had better focus on his scores for “linguistic intelligence” or “logical-mathematical intelligence”—roughly, the talents measured by IQ tests—rather than on any of Gardner’s other five intelligences. Furthermore, Gardner has fared no better than anyone else in showing that the elements of “intelligence” as commonly understood—such as the ability to manipulate complex information or solve new problems—are in fact statistically independent in the way that Gardner’s labeling of the seven intelligences would imply. Such mental abilities tend to go together. That brings us back to g and Stephen Jay Gould.
In The Mismeasure of Man, Gould based his denial of a general mental factor on a series of claims about the statistical method for identifying g. He resurrected the same arguments in his New Yorker review of The Bell Curve, “gcannot have inherent reality,” Gould writes, “for it emerges in one form of mathematical representation for correlations among tests and disappears (or greatly attenuates) in other forms, which are entirely equivalent in amount of information explained.” He continues: “The fact that Herrnstein and Murray barely mention the factor-analytic argument forms a central indictment of The Bell Curve and is an illustration of its vacuousness.” Where, Gould asks, is the evidence that g “captures a real property in the head”?8
The reason we “barely mention the factor-analytic argument” is that it has no scholarly standing. Gould’s statistical indictment of g was old news when it appeared. As Mismeasure’s reviewer in the British science journal Natureput it, Gould’s “discussion of the theory of intelligence stops at the stage it was in more than a quarter of a century ago.”9 Indeed, the appearance of Gould’s book coincided with a renaissance of work on g that proceeded wholly unaffected by Gould’s charges.
To see what this particular fight is about, a little more background than we give in the text is essential. As we noted in the Introduction (pp. 2-3), one of the earliest findings about mental tests was that the results of different tests of apparently different mental skills are positively correlated. Charles Spearman, the British founding father of modern psychometrics, was the first to hypothesize that the tests were correlated because each was tapping into a common construct: the general mental ability he then labeled g. The statistical technique of factor analysis is the method used to extract this general factor that accounts for the intercorrelations among subtests. Factor analysis permits alternative methods of extracting factors, however. The hero of Gould’s story is another pioneering psychometrician, L. L. Thurstone, who in the 1930s became Spearman’s great antagonist by demonstrating how factor analysis need not yield a dominant general factor. Gould is correct in stating that any of the alternative methods will have the same over-all power to account for the correlations among the tests.
Gould is wrong, however, when he implies that by using an alternative method, an analyst can get rid of g. As Richard Herrnstein liked to say, “You can make g hide, but you can’t make it go away.” For those who want to pursue the technical issues, I recommend John B. Carroll’s recent book, Human Cognitive Abilities: A Survey of Factor-Analytic Studies.10 Carroll, a student of Thurstone and former director of the L. L. Thurstone Psychometric Laboratory at the University of North Carolina, recounts the controversy between Spearman and Thurstone over the existence of a general factor, pointing out that Thurstone proposed reasonable criteria for choosing among possible solutions to the factorial problem. In his later years, Thurstone also came to accept the notion of a general factor arising out of the correlations among “lower-order” factors.
In any case, it has been decades since Gould’s statistical argument has been a live issue among those who specialize in factor analysis. There are established technical grounds for permitting factor analysis to extract a general factor from a battery of mental tests. Doing so will yield a dominant factor that not only explains more of the variance than any other factor but typically explains three times as much variance as all of the other factors combined. Thus the frustration among psychometricians who tried to get rid of g. After applying the particular factor-analytic method that prevents g from emerging, there was nowhere to take the results. Anyone who tried to label the independent factors as being distinct mental skills and develop a research agenda based on them was crushed by critics who could demonstrate that the results were more elegantly explained by g. For more than half a century, the holy grail of psychometrics was a set of statistically independent primary mental abilities. Careers were consumed in the search. No one succeeded.
But there is a more direct way of asking whether g is a valid construct: g is a construct in the same way that energy is a construct. Both have theoretical underpinnings, but neither is a reified “thing.” Evidence that they are useful constructs is found in the ways they relate to real-world phenomena. In the case of g, we have three possibilities. One is that g is an arbitrary creation of number crunching. If so, it should be nothing but noise in statistical analysis, showing no more relationship to phenomena in the real world than numbers generated by a random number table. A second possibility is that g is a surrogate for something else—a proxy measure of educational attainment, perhaps, or socio-economic background. If this is the case, the correlations of g to real-world phenomena are spurious, and it should be easy to demonstrate by showing that the “real” causes (such as educational attainment) can explain everything that g explains, more parsimoniously. The third possibility is that g is a (partly) biological phenomenon in its own right—a basic characteristic of the organism that exerts some influence on its ability to reason, think, and learn.
On the first two possibilities, the empirical record is rich and large. Chapter 3 tells this story for job productivity, showing how g explains productivity in ways that education and socioeconomic background cannot. The eight chapters of Part II tell the story for a wide variety of social indicators, again after taking the contributions of education or socioeconomic status into account. As for the third possibility, that g is a biological phenomenon, let us count the ways in which g seems to capture a “real property in the head.”
First, a growing body of evidence links g, and IQ scores more generally, with neurophysiological functioning and a genetic ground: The higher the g loading of a subtest is, the higher is its heritability. The higher the g loading of a subtest is, the higher is the degree of inbreeding depression (an established genetic phenomenon). Reaction times on elementary cognitive tasks that require no conscious thought, such as responding to a lighted button, show a significant correlation with IQ test scores. This correlation depends mostly, perhaps entirely, on g. A significant relationship exists between g and evoked electrical potentials of the cerebral cortex. A significant inverse relationship exists between nonverbal (and highly g loaded) IQ test scores and the brain’s consumption of glucose in the areas of the brain tapped by the cognitive test. The higher the scores are on IQ tests, the faster is the speed of neural and synaptic transmission in the visual tract.11
For each of these statements, there is a corollary: No alternative casting of the test items can compete with g in producing such results. For example, suppose you give a psychometrician the chance to extract g and leave you with all the remaining factors in a given mental test. You cannot manipulate any one or any combination of those factors so as to produce the relationships I just listed. Only g, that supposedly arbitrary creation of the psychometricians, can do so. To sum up: The reality and importance of g has long since, in many ways, been established independent of its statistical properties.
Gould’s popularity is such that his review in the New Yorker was circulated by some nonpsychologists as the canonical refutation of The Bell Curve. But I think he made a mistake in reraising the factor-analytic argument. By doing so, he accomplished something that The Bell Curve alone could not do: He made scholars who know what the evidence shows angry enough to go public. By and large, scholars of intelligence are reclusive. The experiences in the 1970s of people like Arthur Jensen, Hans Eysenck, and Richard Herrnstein himself taught them that the consequences of being visible can be extremely punishing. But Gould was saying things that, to professionals in the field, were palpably wrong about a topic of deep importance. The early results were a few outraged letters sent to the New Yorker (none was printed). Then came a statement of mainstream intelligence signed by fifty-two scholars and published in the Wall Street Journal in which all of the main scientific findings of The Bell Curve were endorsed (without any explicit mention of the book or its critics).12 I also hear second-hand that reporters have called scholars about “this pseudoscience g business” and received an answer that they did not expect.
These may be harbingers of a shift in the media’s treatment of intelligence. There is now a real chance that the press will begin to discover that it has been missing the story. The big news about the study of intelligence is not that science has moved beyond the concept of a general mental ability but the remarkable resilience and utility of this construct called g.
Race, IQ, and Genes
I come now to the second example of how the attacks on The Bell Curve are likely to have unintended consequences: the determination of the critics to focus on race and genes, even though The Bell Curve does not.
In Chapter 13, The Bell Curve draws three important conclusions about intelligence and race: (1) All races are represented across the range of intelligence, from lowest to highest. (2) American blacks and whites continue to have different mean scores on mental tests, varying from test to test but usually about one standard deviation in magnitude—about fifteen IQ points. “One standard deviation” means roughly that the average black scores at the sixteenth percentile of the white distribution. (3) Mental-test scores are generally as predictive of academic and job performance for blacks as for other ethnic groups. Insofar as the tests are biased at all, they tend to overpredict, not underpredict, black performance.
These facts are useful in the quest to understand why (for example) occupational and wage differences separate blacks and whites, or why aggressive affirmative action has produced academic apartheid in our universities. More generally, Herrnstein and I believe that a broad range of American social issues cannot be interpreted without understanding the ways in which intelligence plays a role that is often, and wrongly, conflated with the role of race. When it comes to government policy, and as we say emphatically at various points in Part IV, there is just one authentic policy implication: Return as quickly as possible to the cornerstone of the American ideal—that people are to be treated as individuals, not as members of groups.
The furor over The Bell Curve and race has barely touched on these core points. Instead, the critics have been obsessed—no hyperbole here—with genes, trying to stamp out any consideration of the possibility that race differences have a genetic component.
You may read everything we say about the relationship of genes to race differences in intelligence on pp. 295-315. Our position does not take long to summarize, however: A legitimate scientific debate on the topic is underway; it is scientifically prudent at this point to assume that both environment and genes are involved, in unknown proportions; and, most important, people are getting far too excited about the whole issue. Genetically caused differences are not as fearful, or environmentally caused differences as benign, as many think. What matters is not the source but the existence of group differences and their intractability (for whatever reasons).
As I have watched the frenzied attacks on this scientifically unexceptional part of the book, I have decided that Richard Herrnstein and I were what is known as “prematurely right.” Certainly we were right empirically when we observed that the public at large is fascinated by the possibility of genetic differences (pp. 296-297) and that the intellectual elites have been “almost hysterically in denial about that possibility” (p. 315). I think we were right in trying to dampen that fascination. But as I listen to some of my most loyal friends insisting that I must be disingenuous when I continue to say that the genetic question is not a big deal, it also appears that Herrnstein and I failed to make the case persuasively. This does not mean I can now improve our presentation. I have reread the concluding pages of Chapter 13 many times since the publication of the book, pondering how we could have stated our case more clearly. To this day, I have no good ideas. As far as I can tell, we said it right the first time.
My main point here is that the attacks on our discussion of genes and race are not doing any good for the cause of those who want to discredit the idea that genes could be involved. They have based their attacks on the premise that a full, fair look at the data will make the issue go away. No one appears to have recognized that Herrnstein and I did not make nearly as aggressive a case for genetic differences as the data permit.
The most abundant source of data that we downplayed is in the work of J. Philippe Rushton, a psychologist who since 1985 has been publishing increasingly detailed data to support his theory that the three races he labels Negroid, Caucasoid, and Mongoloid vary not just in intelligence but in a wide variety of other characteristics. We put our discussion of Rushton in Appendix 5. The critics of The Bell Curve are putting him on the front page, often outrageously caricaturing his work. The trouble with this strategy is that Rushton is a serious scholar who has assembled serious data.13 Consider just one example: brain size. One of the most memorable features of Gould’s The Mismeasure of Man was his ridicule of the attempts by nineteenth-century scientists to establish a relationship between cranial capacity and intelligence. But the empirical reality, verified by numerous modern studies, including several based on magnetic resonance imaging, is that a significant and substantial relationship does exist between brain size and measured intelligence after body size is taken into account and that the races do have different distributions of brain size.14 Rushton brings this large empirical documentation together. The attacks on The Bell Curve ensure that such data will get more attention.
Among those who have tried to quell any consideration that genes might play a role in racial differences, Charles Lane and Leon Kamin probably miscalculated most egregiously. I refer to their highly publicized attack on the “tainted sources” used in The Bell Curve. Lane introduced this theme with an initial article in the New Republic and then a much longer one in the New York Review of Books.15 In the latter piece, he proclaimed that “no fewer than seventeen researchers cited in the bibliography of The Bell Curve have contributed to Mankind Quarterly … a notorious journal of’racial history’ founded, and funded, by men who believe in the genetic superiority of the white race.” Lane also discovered that we cite thirteen scholars who have received funding from the Pioneer Fund, founded and run (he alleged) by men who were Nazi sympathizers, eugenicists, and advocates of white racial superiority. Leon Kamin, a vociferous critic of IQ in all its manifestations, took up the same argument at length in his review of The Bell Curve in Scientific American.16
Never mind that The Bell Curve draws its evidence from more than a thousand scholars. Never mind that among the scholars in Lane’s short list are some of the most respected psychologists of our time and that almost all of the sources referred to as tainted are articles published in leading refereed journals. Never mind that the relationship between the founder of the Pioneer Fund and today’s Pioneer Fund is roughly analogous to that between Henry Ford and today’s Ford Foundation. The charges have been made, they have wide currency, and some people will always believe that The Bell Curve rests on data concocted by neo-Nazi eugenicists.
But in the process of making their case, Lane and Kamin tried to go beyond guilt by association: They tried to demonstrate the specific ways in which these Mankind Quarterly and Pioneer Fund scholars we cited were racist. To do that, they focused on our citations of studies of African IQ.
The topic of African IQ is a tiny piece of The Bell Curve—three paragraphs on pp. 288-289 intended to address a hypothesis Herrnstein and I heard frequently: The test scores of American blacks have been depressed by the experience of slavery and African blacks will be found to do better. We briefly summarize the literature indicating that African blacks in fact have lower test scores than American blacks.
Lane and Kamin assault this conclusion ferociously. We are an easy target. We say so little about African IQ that it is easy for Lane and Kamin to point to the many technical difficulties of knowing exactly what is going on. But we also omit many more details that make a strong case that African blacks have very low scores on standardized mental tests. Lane and Kamin want our sources to be weak and racist. That they are not bears importantly, if inconclusively, on possible genetic racial differences.
Blinded to that possibility by their seeming prejudgment of the issue, Lane and Kamin apparently are not worried about what will happen when their critiques lead other scholars to explore the studies that we cited. They should be. Even when samples of Africans are selected in ways that will tend to bias the results upward—for example, by limiting the sample to people who have completed primary school (many of the least able have dropped out by that time), people who are employed, or people who live in urban areas—and even when the tests involved are ones such as the Ravens Standard Progressive Matrices (SPM), designed for cross-cultural comparisons, devoid of any requirements of literacy or numeracy, the scores of African samples everywhere have been in the region of two standard deviations below European or East Asian means. The studies vary in quality, but some are excellent, and it is not the case that the better the study is, the higher the African score is found to be. On the contrary, some of the lowest scores have been found in the largest, most careful, and most recent studies.
To illustrate how troubling the results have been, let me turn to two studies postdating Richard Lynn’s review that we cite on p. 289. One was a South African study led by Kenneth Owen published in the refereed British journal Personality and Individual Differences.17 Its sample consisted of enrolled seventh-grade students: 1,056 whites, 778 coloureds (mixed race), 1,063 Indians, and 1,093 blacks. The SPM was administered without time limits. Except for the Indians, subjects were tested by school psychologists of the same ethnic group. Owen presents the full psychometric profile for the test results (distributional characteristics, reliability, item difficulty, item discrimination, congruence coefficients, and discriminant analysis), demonstrating that the test was measuring the same thing for the various ethnic groups. The differences in test means, expressed in standard deviations, were as follows: Indian-white: −.52; coloured-white: −1.35; black-white: −2.78.
The second example of a recent, careful study was conducted by a black scholar, Fred Zindi, and published in the Psychologist.18 It matched 204 black Zimbabwean pupils and 202 white English students from London inner-city schools for age (12—14 years old), sex, and educational level, both samples being characterized as “working class.” Despite the fact that the white sample was well below average for the whites, with a mean IQ measured by the Wechsler Intelligence Scale for Children-Revised (WISC-R) of only 95, the black-white difference was 1.97 standard deviations on the SPM and 2.36 standard deviations on the WISC-R. Professor Zindi expressed the SPM results as IQ scores. The means for the Zimbabwean sample were 72 for the SPM and 67 for the WISC-R, consistent with Richard Lynn’s estimates. There is reason to think that the WlSC-R score was somewhat depressed by language considerations but not much: The (nonverbal) performance IQ score of the Zimbabwean sample was only 70.
What should one make of these results? Above all, one must proceed cautiously in drawing conclusions, for all the reasons that kept us from presenting these results in detail in The Bell Curve. The problem is not, as often alleged, that such studies are written by racists (in the two instances just cited, a charge belied by Owen’s scholarly reputation and by Zindi’s race) but that the African story is still so incomplete. Our view was that the current differences will narrow over time, probably dramatically, as nutrition and the quality of schools for black Africans improve. Changes in black African culture may provide an environment more conducive to cognitive development among young children. But the current differences as measured through these samples as of the 1990s are not figments of anyone’s imagination. Lane, Kamin, and others who have attempted to discredit The Bell Curve by focusing on our “tainted sources” have ensured that the African data will get more attention.
The Statistical Importance of The Bell Curve’s results.
The third line of attack on The Bell Curve that I predict will have an unintended outcome is the attempt to dismiss the statistical power of the book’s results. Perhaps the most important section of The Bell Curve is Part II, the series of chapters describing the relationship of IQ to poverty, school dropout, unemployment, divorce and illegitimacy, welfare, parenting, crime, and citizenship, using non-Latino whites from the National Longitudinal Study of Youth. The eight chapters in this part deal with questions like, “What role does IQ play in determining whether a woman has a baby out of wedlock?” Or: “What are the comparative roles of socioeconomic disadvantage and IQ in determining whether a youngster grows up to be poor as an adult?” These are fascinating questions. But you will have a hard time figuring out from the published commentary on The Bell Curve that such questions were even asked, let alone what the answers were.
Instead, the main line of attack has been that no one really needs to pay any attention to those chapters because Herrnstein and I are confusing correlation with causation, IQ really does not explain much of the variance anyway, and even if that were not true, our measure of socioeconomic background is deficient. On all three counts, the critics are setting up a reexamination of the existing technical literature on social problems that will be embarrassing to them in the end.
First, regarding correlation and causation, read pp. 122-124 of the Introduction to Part II. Reduced to its essentials: The nonexperimental social sciences cannot demonstrate unequivocal causality. In trying to explain such social problems as poverty, illegitimacy, and crime, we use statistics to show what independent role is left for IQ after taking a person’s age, socioeconomic background, and education into account. When there are other obvious explanations—family structure, for example—we take them into account as well. Apart from the statistics, we describe in commonsense terms what the nature of the causal link might be—why, for example, a poor young woman of low intelligence might be more likely to have a baby out of wedlock than a poor young woman of high intelligence. At the end of this exercise, repeated in similar form for each of the eight chapters in Part II, there will still be unanswered questions, and we point out many of those unanswered questions ourselves. But readers will know more than they knew before, and the way will be opened for further explorations by our colleagues.
The statistical method we used throughout is the basic technique for discussing causation in nonexperimental situations: regression analysis, usually with only three independent variables. We interpret the results according to accepted practice. To enable readers to check for themselves, the printout of all the results is shown in Appendix 4.
The assault on this modest but useful analysis has been led by Leon Kamin in his Scientific American review. He argues that we cannot disentangle the role of IQ from socioeconomic background and suggests that in our database, the children of laborers have such uniformly low IQ scores that we cannot possibly tell whether the low IQ or the disadvantaged background is to blame for the higher rates of crime, unemployment, and illegitimacy that afflict such youngsters. “The significant question,” Kamin writes, “is, why don’t the children of laborers acquire the skills that are tapped by IQ tests?”19
The answer to his significant question is that they do acquire such skills often enough to permit a good look at the comparative roles of socioeconomic background and IQ. Of the non-Latino whites used in the analyses throughout Part II, 1,589 came from families in the bottom quartile on our SES index. Of these, 451 had above-average IQs and 147 were in the top quartile of IQ. As we report throughout Part II, the results are encouraging: In America, bright children of laborers tend to do well in life, despite their humble origins. Herrnstein and I suggest that such a pattern points to causation. This is indeed an inference—a sensible inference.
We approach the correlation-causation tangle in other sensible ways as well. Consider the vexing case of education. People with high IQs tend to spend many years in school; people with low IQs tend to leave. Does the IQ cause the years of education, or the years of education the IQ? As we also discuss in the Introduction to Part II (pp. 124-125), it is unwise, for various technical reasons, to enter years of education as an additional independent variable, so instead we define two subsamples, each with homogeneous education: one of adults who had completed exactly twelve years of school and obtained a high school diploma, no more and no less; the other of adults who had completed .exactly sixteen years of school and obtained a bachelor’s degree, no more and no less, enabling us to report the independent effect of IQ for people with identical education.
Our procedure irritated a number of academic critics, who grumble that the state of the art permits much more. Yes, it does, and in the book we mention periodically how much we look forward to watching our colleagues apply more sophisticated techniques to unanswered questions. But more sophisticated modeling techniques would also have opened a wide variety of technical questions that Herrnstein and I wanted to avoid. The procedure we chose gives an excellent way of bounding the independent effects of education, and that was our purpose.
But let us say a critic grants the existence of independent relationships between IQ and social outcomes after holding other plausible causes constant. How important are these “independent relationships”? Trivially so, says Stephen Jay Gould in his New Yorker review. The Bell Curve can safely be dismissed, he says, because IQ explains so little about the social outcomes in question—just a few percent of the variance, in the statistician’s jargon.
Here is the truth: The relationships between IQ and social behaviors that we present in the book are so powerful that they will revolutionize sociology. They are not only “significant” in the standard statistical sense of that phrase but are as well powerful in a substantive sense—often much more powerful than the relationships linking social behaviors with the usual suspects (education, social status, affluence, ethnicity). Not only are the attacks on these relationships unwarranted, but Herrnstein and I actually understate the strength of the statistical record. The story is complex but worth recounting because it tells so much about the academic response to The Bell Curve.
In ordinary multiple regression analysis, two statistics are of special interest. The first is the set of regression coefficients, one for each independent variable, explaining the magnitude of the effect each independent variable has on the dependent variable after taking the role of all the other independent variables into account. Each coefficient has a standard error, which may be used to determine whether the coefficient is statistically significant (i.e., unlikely to have been produced by chance). The second statistic of special interest is the square of the multiple correlation, R2, which tells how much of the variance in the dependent variable is explained by all the independent variables taken together.
One of the early topics about multiple regression that graduate students study is the different uses of regression coefficients and R2. If a coefficient is both large in a substantive sense and statistically significant, it is typically the statistic of main interpretive importance, while R2 is of secondary and sometimes trivial importance. Such is the case with the kind of analysis in The Bell Curve, for reasons we explain in Appendix 4. In all this, we treat our data as our colleagues around the country treat regression results every day. There is nothing controversial here, as evidenced by the fact that none of the quantitative social scientists who reviewed this part of the manuscript for The Bell Curve raised a question about our methods.
But that is not the end of the story. Herrnstein and I refer to the R2s in the introduction to Part II and in Appendix 4 as if they represent “explained variance”—and thereby we commit a technical error, falsely understating the overall explanatory power of our statistics. In logistic regression analysis with binary dependent variables, the kind of analysis we used throughout Part II, the statistic labeled R2 is an ersatz and unsatisfactory attempt to express the model’s goodness of fit. Most statisticians to whom I have talked since say we should have ignored it altogether. Stephen Jay Gould fell into the same error.
Once again, Gould’s criticism has been picked up by many others. It would be nice if a few respected professors would publicly point out that whatever else one might think about The Bell Curve, the criticisms about the small R2s in The Bell Curve are wrong. But this is unlikely to happen. Probably the allegation will quietly fade away as the academics who know the true story discreetly impart the news to those who do not.
The unfounded criticisms of the statistics in The Bell Curve that I have discussed so far will cause merely embarrassment among a few who both understand the issues and have the decency to be embarrassed. The real potential for backfire in the statistical critique of The Bell Curve comes from the attack on our use of socioeconomic status (SES).
Measures of SES are a staple in the social sciences. Leaf through the dozens of technical articles in sociology and economics dealing with issues of success and failure in American life, and you will frequently find a measure of SES as part of the analysis. A major purpose of The Bell Curve was to add IQ to SES as an explanatory variable. To avoid controversy, we deliberately constructed an SES index that uses the same elements everybody else does: income, occupation, and education. We did not have any reason for weighting any of these more heavily than the other, so we converted them to what are called “standard scores” and added them up to get our index, all of which would ordinarily have caused no comment.
But when it comes to The Bell Curve, a standard SES index suddenly becomes problematic. James Heckman notes ominously that we did not have income data for a large part of the sample.20 Arthur Goldberger looks suspiciously on the idea of standardizing the variables.21 Leon Kamin figures that probably the self-reports of income, education, and occupation are exaggerated in ways that falsely produce the relationships we report.22
My response to such criticisms is, “Fine. Let’s test out these potential problems.” Compare the results for the subsamples with and without income data. Do not standardize the variables; create some other scales, and use some other method of combining them. Examine the validity of the self-report data. If one does not like the idea of using an index at all, there is a simple solution: Enter the constituent variables separately, and ask directly how parental education, income, and occupation compete individually with the independent role of the child’s IQ.
As scholars are supposed to, Herrnstein and I checked out these and many other possibilities—the results we report were triangulated in numbing detail during the years we worked on the book—and we knew before publishing The Bell Curve what the critics who bother to retrace our tracks will discover: There is no way to construct a measure of socioeconomic background using the accepted constituent variables that makes much difference in the independent role of IQ.23 In the jargon, our measure of SES is robust and as valid as everyone else’s has been.
But there’s the rub: How valid have everyone else’s been? Until The Bell Curve came along, measures of SES similar to ours were used without a second thought. Now, suddenly, they are to be questioned. I doubt whether the questioning will be confined to just The Bell Curve. But there is not much room to improve such measures, for there is no way around it: SES is in fact dominated by occupation, education, and income. What we did in the book is, in effect, to throw down a challenge: Anyone who does not like the way IQ dominates this thing called “socioeconomic status” in producing important social outcomes should come up with another way of measuring the environment.
Such measures have been emerging over the past few decades. The HOME index we discuss in Chapter 10 is an example. But the social sciences have only scratched the surface. It is now broadly accepted, as it was not only a decade ago, that the presence of the biological father in the home has many important positive effects on children independent of SES. How much more might be understood if we could add to mere presence a good measure of competence. Suppose, for example, that one could create a good measure of the “competency of a father in the raising of a female child.” That might have a large independent effect on the girl’s chances of giving birth to a baby out of wedlock, whatever her IQ. Suppose that one could create a good measure of the “degree to which a young male is raised in an environment where high moral standards are enforced consistently and firmly.” I can imagine this having a major effect on the likelihood of becoming a criminal, independent of IQ.
But the same measures that compete with the importance of IQ are going to make starkly clear something that The Bell Curve has already suggested: The kinds of economic and social disadvantages that have been treated as decisive in recent discussions of social policy are comparatively unimportant. It may sound like an issue that concerns only social scientists. Far from it. If I were to nominate the biggest sleeper effect to emerge from The Bell Curve,it would be the degree to which the book undermines SES as a way of interpreting social problems, and with it the rationale for many of the social policies that came into vogue in the 1960s.
The Malleability of IQ
Raising the question of policy brings us to the last of my four examples of the potential backfire effect of attacks on The Bell Curve: the malleability of IQ. These attacks focused on Chapter 17. The cries of protest have been almost as loud as those directed to our chapter on race, and for the reason that Michael Novak identified: By arguing that no easy ways of raising IQ exist, we “destroy hope,” or at least the kind of hope that drives many of the educational and preschool interventions for disadvantaged youth.
Actually, we do express hope. Because the environment plays a significant role in determining intelligence—a point The Bell Curve states clearly and often—we say that sooner or later researchers ought to be able to figure out where the levers are. We urge that steps be taken to hasten the day when such knowledge becomes available. But in examining the current state of knowledge, we also urge realism. Speaking of the most popular idea, intensive intervention for preschoolers, we indeed conclude that “we and everyone else are far from knowing whether, let alone how, any of these projects have increased intelligence” (p. 409). We also predict that “many ostensibly successful projects will be cited as plain and indisputable evidence that we are willfully refusing to see the light” (p. 410).
This prediction has been borne out. Psychologist Richard Nisbett, for example, writing in The Bell Curve Wars, accuses us of being “strangely selective” in our reports about the effects of intervention and wonders if we were “unaware of the very large literature that exists on the topic of early intervention.”24 The “very large literature” of which we were unaware? The only study Nisbett mentions is a 1992 Pediatrics article about a program to provide special services to low birth weight babies. He describes the results as showing a nine-point IQ advantage at age 3 for participants in the intervention. Nisbett neglects to acknowledge the unreliability and instability of IQ measures at age 3. He fails to mention that two IQ measures were used, with the second one showing a gain of just 6.4 points. Most important, Nisbett fails to mention that a follow-up of the same project was published in 1994, when the children were age 5, old enough that IQ scores are beginning to become interpretable. The results? The experimental group had an advantage of just 2.5 points on one measure of IQ and two-tenths of a point on another—both differences being substantively trivial and statistically insignificant.25
There is one slender lead in this study. The 1994 follow-up broke down the results according to whether the babies were “lighter LBW [low birth weight]” (less than 2,000 grams) or “heavier LBW” (2,001-2,500 grams). That comparison showed a gain of 6.0 IQ points for the heavier LBW group on one IQ test and a 3.7 point gain on the other IQ test—not large gains but nevertheless significant. But with each bit of good news goes bad news: The lighter LBW sample showed a gain of only 0.6 point on one IQ test and a drop of 1.5 points on the other. This is the familiar story from Chapter 17: a little hope, as much disappointment, no breakthroughs.
The larger problem with those who claim that Herrnstein and I were too pessimistic is that they conflate improvements in educational achievement with improvements in cognitive ability. The distinction is crucial: Do we know how to take a set of youngsters with a given tested IQ and reliably improve their educational achievement? Yes. Do we know how to take a set of youngsters with a given tested IQ that would not (for example) allow them to become engineers, and reliably raise their cognitive functioning so that they can become engineers? No.
I will venture two broad statements on this issue. First, in the critiques to date, no one has pointed to a credible study showing evidence of significant, long-term effects on IQ scores that we do not consider in The Bell Curve.Second, our account of the record to date is, if anything, generous. The two major intensive interventions for raising the IQ of children at high risk of mental retardation, the Milwaukee Project and the Abecedarian Project, have come under intense methodological criticism in the technical literature. We allude to the controversy on pp. 407-409, but in neither case is the evidence so clear that it was appropriate for us to come down hard on the “no-effect” conclusion, so we do not. If we err, it is in the direction of giving more credit to the interventions than is warranted.
But just as we predicted, many others are nominating “programs that work” that we mysteriously failed to consider. I am sure that some of them do work—for goals other than raising IQ. We would be the last to suggest that education cannot be made better, for example, or that the socialization of children cannot be improved. But in The Bell Curve we talk about a particular goal: improving the cognitive functioning of human beings over the long term. On that score, the record remains as Herrnstein and I describe it: Yes, it can be done, but apparently only in modest amounts for most children, usually temporarily, and inconsistently.
In this instance, I have reason to hope that the unintended effect of the attacks on The Bell Curve will be to crystallize a debate that has long needed crystallizing. The cry that “Herrnstein and Murray are too pessimistic” is going to force a great many claims to be laid on the table for examination. Howard Gardner’s review in American Prospect took us to task for not citing Lisbeth and Daniel Schorr’s book, Within Our Reach, for example. I would be delighted to join in a rigorous look at the programs they describe and see whether we find among them hard evidence for long-term improvement in cognitive functioning.26 Let us bring up all the other nominees for inspection as well. In short, I hope that academicians and politicians alike use the furor over The Bell Curve finally to come to grips with how difficult it is, given the current state of knowledge, for outside interventions to make much difference in the environmental factors that nurture cognitive development.
A few weeks after The Bell Curve appeared, a reporter said to me that the real message of the book is, “Get serious.” I resisted his comment at first, but now I think he was right. We never quite say it in so many words, but the book’s subtext is that America’s discussion of social policy since the 1960s has been carried on in a never-never land where human beings are easily changed and society can eventually become a Lake Wobegon where all the children are above average. The Bell Curve does indeed imply that it is time to get serious about how best to accommodate the huge and often intractable individual differences that shape human society.
This is a counsel not of despair but of realism, including realistic hope. An individual’s g may not be as elastic as one would prefer, but the inventiveness of the species seems to have few bounds. In The Bell Curve, we are matter-of-fact about the limits facing low-IQ individuals in a postindustrial economy, but we also celebrate the capacity of people everywhere in the normal range on the bell curve to live morally autonomous, satisfying lives, if only the system will let them. Accepting the message of The Bell Curve does not mean giving up on improving social policy but thinking anew about how progress is to be achieved—and, even more fundamental, thinking anew about how progress is to be defined.
The verdict on the influence of The Bell Curve on policy is many years away. For now, the book may have another useful role to play that we did not anticipate. The attacks on it have often read like an unintentional confirmation of our view of the cognitive elite as a new caste, complete with high priests, dogmas, heresies, and apostates. But the violent response, unpleasant as it has been in the short run, is essential if The Bell Curve is to play its constructive role in the long run. The social science that deals in public policy has in the latter part of the twentieth century become self-censored and riddled with taboos—in a word, corrupt. Only the most profound, anguished, and divisive reexamination is going to change that situation, and it has to be done within the profession. Perhaps starting that reexamination will be The Bell Curve’s most important achievement.
20 June 1995