Introduction - The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray

The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray (1996)


That the word intelligence describes something real and that it varies from person to person is as universal and ancient as any understanding about the state of being human. Literate cultures everywhere and throughout history have had words for saying that some people are smarter than others. Given the survival value of intelligence, the concept must be still older than that. Gossip about who in the tribe is cleverest has probably been a topic of conversation around the fire since fires, and conversation, were invented.

Yet for the last thirty years, the concept of intelligence has been a pariah in the world of ideas. The attempt to measure it with tests has been variously dismissed as an artifact of racism, political reaction, statistical bungling, and scholarly fraud. Many of you have reached this page assuming that these accusations are proved. In such a context comes this book, blithely proceeding on the assumption that intelligence is a reasonably well-understood construct, measured with accuracy and fairness by any number of standardized mental tests. The rest of this book can be better followed if you first understand why we can hold such apparently heterodox views, and for this it is necessary to know something about the story of measured intelligence.


Variation in intelligence became the subject of productive scientific study in the last half of the nineteenth century, stimulated, like so many other intellectual developments of that era, by Charles Darwin’s theory of evolution. Darwin had asserted that the transmission of inherited intelligence was a key step in human evolution, driving our simian ancestors apart from the other apes. Sir Francis Galton, Darwin’s young cousin and already a celebrated geographer in his own right, seized on this idea and set out to demonstrate its continuing relevance by using the great families of Britain as a primary source of data. He presented evidence that intellectual capacity of various sorts ran in families in Hereditary Genius, published just a decade after the appearance of Origin of Species in 1859. So began a long and deeply controversial association between intelligence and heredity that remains with us today.1

Galton realized that he needed a precise, quantitative measure of the mental qualities he was trying to analyze, and thus he was led to put in formal terms what most people had always taken for granted: People vary in their intellectual abilities and the differences matter, to them personally and to society.2 Not only are some people smarter than others, said Galton, but each person’s pattern of intellectual abilities is unique. People differ in their talents, their intellectual strengths and weaknesses, their preferred forms of imagery, their mental vigor.

Working from these observations, Galton tried to devise an intelligence test as we understand the term today: a set of items probing intellectual capacities that could be graded objectively. Galton had the idea that intelligence would surface in the form of sensitivity of perceptions, so he constructed tests that relied on measures of acuity of sight and hearing, sensitivity to slight pressures on the skin, and speed of reaction to simple stimuli. His tests failed, but others followed where Galton had led. His most influential immediate successor, a French psychologist, Alfred Binet, soon developed questions that attempted to measure intelligence by measuring a person’s ability to reason, draw analogies, and identify patterns.3 These tests, crude as they were by modern standards, met the key criterion that Galton’s tests could not: Their results generally accorded with common understandings of high and low intelligence.

By the end of the nineteenth century, mental tests in a form that we would recognize today were already in use throughout the British Commonwealth, the United States, much of continental Europe, and Japan.4 Then, in 1904, a former British Army officer named Charles Spearman made a conceptual and statistical breakthrough that has shaped both the development and much of the methodological controversy about mental tests ever since.5

By that time, considerable progress had been made in statistics. Unlike Galton in his early years, investigators in the early twentieth century had available to them an invaluable number, the correlation coefficient first devised by Galton himself in 1888 and elaborated by his disciple, Karl Pearson.6 Before the correlation coefficient was available, scientists could observe that two variables, such as height and weight, seemed to vary together (the taller the heavier, by and large), but they had no way of saying exactly how much they were related. With Pearson’s r, as the coefficient was labeled, they now could specify “how much” of a relationship existed, on a scale ranging from a minimum of −1 (for perfectly inverse relationships) to +1 (for perfectly direct relationships).

Spearman noted that as the data from many different mental tests were accumulating, a curious result kept turning up: If the same group of people took two different mental tests, anyone who did well (or poorly) on one test tended to do similarly well (or poorly) on the other. In statistical terms, the scores on the two tests were positively correlated. This outcome did not seem to depend on the specific content of the tests. As long as the tests involved cognitive skills of one sort or another, the positive correlations appeared. Furthermore, individual items within tests showed positive correlations as well. If there was any correlation at all between a pair of items, a person who got one of them right tended to get the other one right, and vice versa for those who got it wrong. In fact, the pattern was stronger than that. It turned out to be nearly impossible to devise items that plausibly measured some cognitive skill and were not positively correlated with other items that plausibly measured some cognitive skill, however disparate the pair of skills might appear to be.

The size of the positive correlations among the pairs of items in a test did vary a lot, however, and it was this combination—positive correlations throughout the correlation matrix, but of varying magnitudes—that inspired Spearman’s insight.7 Why are almost all the correlations positive? Spearman asked. Because, he answered, they are tapping into the same general trait. Why are the magnitudes different? Because some items are more closely related to this general trait than others.8

Spearman’s statistical method, an early example of what has since become known as factor analysis, is complex, and we will explore some of those complexities. But, for now, the basis for factor analysis can be readily understood. Insofar as two items tap into the same trait, they share something in common. Spearman developed a method for estimating how much sharing was going on in a given set of data. From almost any such collection of mental or academic test scores, Spearman’s method of analysis uncovered evidence for a unitary mental factor, which he named g, for “general intelligence.” The evidence for a general factor in intelligence was pervasive but circumstantial, based on statistical analysis rather than direct observation. Its reality therefore was, and remains, arguable.

Spearman then made another major contribution to the study of intelligence by defining what this mysterious g represented. He hypothesized that g is a general capacity for inferring and applying relationships drawn from experience. Being able to grasp, for example, the relationship between a pair of words like harvest and yield, or to recite a list of digits in reverse order, or to see what a geometrical pattern would look like upside down, are examples of tasks (and of test items) that draw on g as Spearman conceived of it. This definition of intelligence differed subtly from the more prevalent idea that intelligence is the ability to learn and to generalize what is learned. The course of learning is affected by intelligence, in Spearman’s view, but it was not the thing in itself. Spearmanian intelligence was a measure of a person’s capacity for complex mental work.

Meanwhile, other testers in Europe and America continued to refine mental measurement. By 1908, the concept of mental level (later called mental age) had been developed, followed in a few years by a slightly more sophisticated concept, the intelligence quotient. IQ at first was just a way of expressing a person’s (usually a child’s) mental level relative to his or her contemporaries. Later, as the uses of testing spread, IQ became a more general way to express a person’s intellectual performance relative to a given population. Already by 1917, soon after the concept of IQ was first defined, the U.S. Army was administering intelligence tests to classify and assign recruits for World War I. Within a few years, the letters “IQ” had entered the American vernacular, where they remain today as a universally understood synonym for intelligence.

To this point, the study of cognitive abilities was a success story, representing one of the rare instances in which the new soft sciences were able to do their work with a rigor not too far short of the standards of the traditional sciences. A new specialty within psychology was created, psychometrics. Although the debates among the psychometricians were often fierce and protracted, they produced an expanded understanding of what was involved in mental capacity. The concept of g survived, embedded in an increasingly complex theory of the structure of cognitive abilities.

Because intelligence tests purported to test rigorously an important and valued trait about people (including ourselves and our loved ones), IQ also became one of the most visible and controversial products of social science. The first wave of public controversy occurred during the first decades of the century, when a few testing enthusiasts proposed using the results of mental tests to support outrageous racial policies. Sterilization laws were passed in sixteen American states between 1907 and 1917, with the elimination of mental retardation being one of the prime targets of the public policy. “Three generations of imbeciles are enough,” Justice Oliver Wendell Holmes declared in an opinion upholding the constitutionality of such a law.9 It was a statement made possible, perhaps encouraged, by the new enthusiasm for mental testing.

In the early 1920s, the chairman of the House Committee on Immigration and Naturalization appointed an “Expert Eugenical Agent” for his committee’s work, a biologist who was especially concerned about keeping up the American level of intelligence by suitable immigration policies.10 An assistant professor of psychology at Princeton, Carl C. Brigham, wrote a book entitled A Study of American Intelligence using the results of the U.S. Army’s World War I mental testing program to conclude that an influx of immigrants from southern and eastern Europe would lower native American intelligence, and that immigration therefore should be restricted to Nordic stock (see the box about tests and immigration).11

Fact and Fiction About Immigration and Intelligence Testing

Two stories about early IQ testing have entered the folklore so thoroughly that people who know almost nothing else about that history bring them up at the beginning of almost any discussion of IQ. The first story is that Jews and other immigrant groups were thought to be below average in intelligence, even feebleminded, which goes to show how untrustworthy such tests (and the testers) are. The other story is that IQ tests were used as the basis for the racist immigration policies of the 1920s, which shows how dangerous such tests (and the testers) are.12

The first is based on the work done at Ellis Island by H. H. Goddard, who explicitly preselected his sample for evidence of low intelligence (his purpose was to test his test’s usefulness in screening for feeblemindedness), and did not try to draw any conclusions about the general distribution of intelligence in immigrant groups.13 The second has a stronger circumstantial case: Brigham published his book just a year before Congress passed the Immigration Restriction Act of 1924, which did indeed tip the flow of immigrants toward the western and northern Europeans. The difficulty with making the causal case is that a close reading of the hearings for the bill shows no evidence that Brigham’s book in particular or IQ tests in general played any role.14

Critics responded vocally. Young Walter Lippmann, already an influential columnist, was one of the most prominent, fearing power-hungry intelligence testers who yearned to “occupy a position of power which no intellectual has held since the collapse of theocracy,”15 In a lengthy exchange in the New Republic in 1922 and 1923 with Lewis Terman, premier American tester of the time and the developer of the Stanford-Binet IQ test, Lippmann wrote, “I hate the impudence of a claim that in fifty minutes you can judge and classify a human being’s predestined fitness in life. I hate the pretentiousness of that claim. I hate the abuse of scientific method which it involves. I hate the sense of superiority which it creates, and the sense of inferiority which it imposes.”16

Lippmann’s characterization of the tests and the testers was sometimes unfair and often factually wrong, as Terman energetically pointed out.17 But while Terman may have won the technical arguments, Lippmann was right to worry that many people were eager to find connections between the results of testing and the more chilling implications of social Darwinism. Even if the psychometricians generally made modest claims for how much the tests predicted, it remained true that “IQr”—that single number with the memorable label—was seductive. As Lippmann feared, people did tend to give more credence to an individual’s specific IQ score and make broader generalizations from it than was appropriate. And not least, there was plenty to criticize in the psychometricians’ results. The methods for collecting and analyzing quantitative psychological data were still new, and some basic inferential mistakes were made.

If the tests had been fatally flawed or merely uninformative, they would have vanished. Why this did not happen is one of the stories we will be telling, but we may anticipate by observing that the use of tests endured and grew because society’s largest institutions—schools, military forces, industries, governments—depend significantly on measurable individual differences. Much as some observers wished it were not true, there is often a need to assess differences between people as objectively, fairly, and efficiently as possible, and even the early mental tests often did a better job of it than any of the alternatives.

During the 1930s, mental tests evolved and improved as their use continued to spread throughout the world. David Wechsler worked on the initial version of the tests that would eventually become the Wechsler Adult Intelligence Scale and the Wechsler Intelligence Scale for Children, the famous WAIS and WISC. Terman and his associates published an improved version of the Stanford-Binet. But these tests were individually administered and had to be scored by trained personnel, and they were therefore too expensive to administer to large groups of people. Psychometricians and test publishers raced to develop groupadministered tests that could be graded by machine. In the search for practical, economical measurements of intelligence, testing grew from a cottage industry to big business.

World War II stimulated another major advance in the state of the art, as psychologists developed paper-and-pencil tests that could accurately identify specific military aptitudes, even ones that included a significant element of physical aptitude (such as an aptitude for flying airplanes). Shortly after the war, psychologists at the University of Minnesota developed the Minnesota Multiphasic Personality Inventory, the first machine-gradable standardized test with demonstrated validity as a predictor of various personality disorders. Later came the California Psychological Inventory, which measured personality characteristics within the normal range—“social presence” and “self-control,” for example. The testing industry was flourishing, and the annual Mental Measurements Yearbook that cataloged the tests grew to hundreds of pages. Hundreds of millions of people throughout the world were being psychologically tested every year.

Attacks on testing faded into the background during this period. Though some psychometricians must have known that the tests were capturing human differences that had unsettling political and social implications, no one of any stature was trying to use the results to promote discriminatory, let alone eugenic, laws. And though many intellectuals outside the testing profession knew of these results, the political agendas of the 1940s and 1950s, whether of New Deal Democrats or Eisenhower Republicans, were more pragmatic than ideological. Yes, intelligence varied, but this was a fact of life that seemed to have little bearing on the way public policy was conducted.


Then came the 1960s, and a new controversy about intelligence tests that continues to this day. It arose not from new findings but from a new outlook on public policy. Beginning with the rise of powerful social democratic and socialist movements after World War I and accelerating across the decades until the 1960s, a fundamental shift was taking place in the received wisdom regarding equality. This was most evident in the political arena, where the civil rights movement and then the War on Poverty raised Americans’ consciousness about the nature of the inequalities in American society. But the changes in outlook ran deeper and broader than politics. Assumptions about the very origins of social problems changed profoundly. Nowhere was the shift more pervasive than in the field of psychology.

Psychometricians of the 1930s had debated whether intelligence is almost entirely produced by genes or whether the environment also plays a role. By the 1960s and 1970s the point of contention had shifted dramatically. It had somehow become controversial to claim, especially in public, that genes had any effect at all on intelligence. Ironically, the evidence for genetic factors in intelligence had greatly strengthened during the very period when the terms of the debate were moving in the other direction.

In the psychological laboratory, there was a similar shift. Psychological experimenters early in the century were, if anything, more likely to concentrate on the inborn patterns of human and animal behavior than on how the learning process could change behavior.18 But from the 1930s to the 1960s, the leading behaviorists, as they were called, and their students and disciples were almost all specialists in learning theory. They filled the technical journals with the results of learning experiments on rats and pigeons, the tacit implication being that genetic endowment mattered so little that we could ignore the differences among species, let alone among human individuals, and still discover enough about the learning process to make it useful and relevant to human concerns.19 There are, indeed, aspects of the learning process that cross the lines between species, but there are also enormous differences, and these differences were sometimes ignored or minimized when psychologists explained their findings to the lay public. B. F. Skinner, at Harvard University, more than any other of the leading behaviorists, broke out of the academic world into public attention with books that applied the findings of laboratory research on animals to human society at large.20

To those who held the behaviorist view, human potential was almost perfectly malleable, shaped by the environment. The causes of human deficiencies in intelligence—or parenting, or social behavior, or work behavior—lay outside the individual. They were caused by flaws in society. Sometimes capitalism was blamed, sometimes an uncaring or incompetent government. Further, the causes of these deficiencies could be fixed by the right public policies—redistribution of wealth, better education, better housing and medical care. Once these environmental causes were removed, the deficiencies should vanish as well, it was argued.

The contrary notion—that individual differences could not easily be diminished by government intervention—collided head-on with the enthusiasm for egalitarianism, which itself collided head-on with a halfcentury of IQ data indicating that differences in intelligence are intractable and significantly heritable and that the average IQ of various socioeconomic and ethnic groups differs.

In 1969, Arthur Jensen, an educational psychologist and expert on testing from the University of California at Berkeley, put a match to this volatile mix of science and ideology with an article in the Harvard Educational Review.21 Asked by the Review’s editors to consider why compensatory and remedial education programs begun with such high hopes during the War on Poverty had yielded such disappointing results, Jensen concluded that the programs were bound to have little success because they were aimed at populations of youngsters with relatively low IQs, and success in school depended to a considerable degree on IQ. IQ had a large heritable component, Jensen also noted. The article further disclosed that the youngsters in the targeted populations were disproportionately black and that historically blacks as a population had exhibited average IQs substantially below those of whites.

The reaction to Jensen’s article was immediate and violent. From 1969 through the mid-1970s, dozens of books and hundreds of articles appeared denouncing the use of IQ tests and arguing that mental abilities are determined by environment, with the genes playing a minor role and race none at all. Jensen’s name became synonymous with a constellation of hateful ways of thinking. “It perhaps is impossible to exaggerate the importance of the Jensen disgrace,” wrote Jerry Hirsch, a psychologist specializing in the genetics of animal behavior who was among Jensen’s more vehement critics. “It has permeated both science and the universities and hoodwinked large segments of government and society. Like Vietnam and Watergate, it is a contemporary symptom of serious affliction.”22 The title of Hirsch’s article was “The Bankruptcy of ‘Science’ Without Scholarship.” During the first few years after the Harvard Educational Review article was published, Jensen could appear in no public forum in the United States without triggering something perilously close to a riot.

The uproar was exacerbated by William Shockley, who had won the Nobel Prize in physics for his contributions to the invention of the transistor but had turned his attention to human variation toward the end of his career. As eccentric as he was brilliant, he often recalled the eugenicists of the early decades of the century. He proposed, as a “thought exercise,” a scheme for paying people with low IQs to be sterilized.23 He supported (and contributed to) a sperm bank for geniuses. He seemed to relish expressing sensitive scientific findings in a way that would outrage or disturb as many people as possible. Jensen and Shockley, utterly unlike as they were in most respects, soon came to be classed together as a pair of racist intellectual cranks.

Then one of us, Richard Herrnstein, an experimental psychologist at Harvard, strayed into forbidden territory with an article in the September 1971 Atlantic Monthly.24 Herrnstein barely mentioned race, but he did talk about heritability of IQ. His proposition, put in the form of a syllogism, was that because IQ is substantially heritable, because economic success in life depends in part on the talents measured by IQ tests, and because social standing depends in part on economic success, it follows that social standing is bound to be based to some extent on inherited differences. By 1971, this had become a controversial thing to say. In media accounts of intelligence, the names Jensen, Shockley, and Herrnstein became roughly interchangeable.

That same year, 1971, the U.S. Supreme Court outlawed the use of standardized ability tests by employers unless they had a “manifest relationship” to the specific job in question because, the Supreme Court held, standardized tests acted as “built-in headwinds” for minority groups, even in the absence of discriminatory intent.25 A year later, the National Education Association called upon the nation’s schools to impose a moratorium on all standardized intelligence testing, hypothesizing that “a third or more of American citizens are intellectually folded, mutilated or spindled before they have a chance to get through elementary school because of linguistically or culturally biased standardized tests.”26 A movement that had begun in the 1960s gained momentum in the early 1970s, as major school systems throughout the country, including those of Chicago, New York, and Los Angeles, limited or banned the use of group-administered standardized tests in public schools. A number of colleges announced that they would no longer require the Scholastic Aptitude Test as part of the admissions process. The legal movement against tests reached its apogee in 1978 in the case of Larry P. Judge Robert Peckham of the U.S. District Court in San Francisco ruled that it was unconstitutional to use IQ tests for placement of children in classes for the educably mentally retarded if the use of those tests resulted in placement of “grossly disproportionate” numbers of black children.27

Meanwhile, the intellectual debate had taken a new and personalized turn. Those who claimed that intelligence was substantially inherited were not just wrong, the critics now discovered, they were charlatans as well. Leon Kamin, a psychologist then at Princeton, opened this phase of the debate with a 1974 book, The Science and Politics of IQ. “Patriotism, we have been told, is the last refuge of scoundrels,” Kamin wrote in the opening pages. “Psychologists and biologists might consider the possibility that heritability is the first.”28 Kamin went on to charge that mental testing and belief in the heritability of IQ in particular had been fostered by people with right-wing political views and racist social views. They had engaged in pseudoscience, he wrote, suppressing the data they did not like and exaggerating the data that agreed with their preconceptions. Examined carefully, the case for the heritability of IQ was nil, concluded Kamin.

In 1976, a British journalist, Oliver Gillie, published an article in the London Sunday Times that seemed to confirm Kamin’s thesis with a sensational revelation: The recently deceased Cyril Burt, Britain’s most eminent psychometrician, author of the largest and most famous study of the intelligence of identical twins who grew up apart, was charged with fraud.29 He had made up data, fudged his results, and invented coauthors, the Sunday Timesdeclared. The subsequent scandal was as big as the Piltdown Man hoax. Cyril Burt had not been just another researcher but one of the giants of twentieth-century psychology. Nor could his colleagues find a ready defense (the defense came later, as described in the box). They protested that the revelations did not compromise the great bulk of the work that bore on the issue of heritability, but their defenses sounded feeble in the light of the suspicions that had preceded Burt’s exposure.

For the public observing the uproar in the academy from the sidelines, the capstone of the assault on the integrity of the discipline occurred in 1981 when Harvard paleobiologist Stephen Jay Gould, author of several popular books on biology, published The Mismeasure of Man.32 Gould examined the history of intelligence testing, found that it was peopled by charlatans, racists, and self-deluded fools, and concluded that “determinist arguments for ranking people according to a single scale of intelligence, no matter how numerically sophisticated, have recorded little more than social prejudice.”33 The Mismeasure of Man became a best-seller and won the National Book Critics Circle Award.

The Burt Affair

It would be more than a decade before the Burt affair was subjected to detailed reexamination. In 1989 and 1991, two accounts of the Burt allegations, by psychologist Robert Joynson and sociologist Ronald Fletcher, written independently, concluded that the attacks against Burt had been motivated by a mixture of professional and ideological antagonism and that no credible case of data falsification or fictitious research or researchers had ever been presented.30 Both authors also concluded that some of Burt’s leading critics were aware that their accusations were inaccurate even at the time they made them. An ironic afterword centers on Burt’s claim that the correlation between the IQs of identical twins reared apart is +.77. A correlation this large almost irrefutably supports a large genetic influence on IQ. Since the attacks on Burt began, it had been savagely derided as fraudulent, the product of Burt’s fiddling with the data to make his case. In 1990, the Minnesota twin study, accepted by most scholars as a model of its kind, produced its most detailed estimates of the correlation of IQ between identical twins reared apart. The procedure that most closely paralleled Burt’s yielded a correlation of +.78.31

Gould and his allies had won the visible battle. By the early 1980s, a new received wisdom about intelligence had been formed that went roughly as follows:

Intelligence is a bankrupt concept. Whatever it might mean—and nobody really knows even how to define it—intelligence is so ephemeral that no one can measure it accurately. IQ tests are, of course, culturally biased, and so are all the other “aptitude” tests, such as the SAT. To the extent that tests such as IQ and SAT measure anything, it certainly is not an innate “intelligence.” IQ scores are not constant; they often change significantly over an individual’s life span. The scores of entire populations can be expected to change over time—look at the Jews, who early in the twentieth century scored below average on IQ scores and now score well above the average. Furthermore, the tests are nearly useless as tools, as confirmed by the well-documented fact that such tests do not predict anything except success in school. Earnings, occupation, productivity—all the important measures of success—are unrelated to the test scores. All that tests really accomplish is to label youngsters, stigmatizing the ones who do not do well and creating a self fulfilling prophecy that injures the socioeconomically disadvantaged in general and blacks in particular.


As far as public discussion is concerned, this collection of beliefs, with some variations, remains the state of wisdom about cognitive abilities and IQ tests. It bears almost no relation to the current state of knowledge among scholars in the field, however, and therein lies a tale. The dialogue about testing has been conducted at two levels during the last two decades—the visible one played out in the press and the subterranean one played out in the technical journals and books.

The case of Arthur Jensen is illustrative. To the public, he surfaced briefly, published an article that was discredited, and fell back into obscurity. Within the world of psychometrics, however, he continued to be one of the profession’s most prolific scholars, respected for his meticulous research by colleagues of every theoretical stripe. Jensen had not recanted. He continued to build on the same empirical findings that had gotten him into such trouble in the 1960s, but primarily in technical publications, where no one outside the profession had to notice. The same thing was happening throughout psychometrics. In the 1970s, scholars observed that colleagues who tried to say publicly that IQ tests had merit, or that intelligence was substantially inherited, or even that intelligence existed as a definable and measurable human quality, paid too high a price. Their careers, family lives, relationships with colleagues, and even physical safety could be jeopardized by speaking out. Why speak out when there was no compelling reason to do so? Research on cognitive abilities continued to flourish, but only in the sanctuary of the ivory tower.

In this cloistered environment, the continuing debate about intelligence was conducted much as debates are conducted within any other academic discipline. The public controversy had surfaced some genuine issues, and the competing parties set about trying to resolve them. Controversial hypotheses were put to the test. Sometimes they were confirmed, sometimes rejected. Often they led to new questions, which were then explored. Substantial progress was made. Many of the issues that created such a public furor in the 1970s were resolved, and the study of cognitive abilities went on to explore new areas.

This is not to say that controversy has ended, only that the controversy within the professional intelligence testing community is much different from that outside it. The issues that seem most salient in articles in the popular press (Isn’t intelligence determined mostly by environment? Aren’t the tests useless because they’re biased?) are not major topics of debate within the profession. On many of the publicly discussed questions, a scholarly consensus has been reached.34 Rather, the contending parties within the professional community divide along other lines. By the early 1990s, they could be roughly divided into three factions for our purposes: the classicists, the revisionists, and the radicals.

The Classicists: Intelligence as a Structure

The classicists work within the tradition begun by Spearman, seeking to identify the components of intelligence much as physicists seek to identify the structure of the atom. As of the 1990s, the classicists are for practical purposes unanimous in accepting that g sits at the center of the structure in a dominating position—not just as an artifact of statistical manipulation but as an expression of a core human mental ability much like the ability Spearman identified at the turn of the century. In their view, g is one of the most thoroughly demonstrated entities in the behavioral sciences and one of the most powerful for understanding socially significant human variation.

The classicists took a long time to reach this level of consensus. The ink on Spearman’s first article on the topic in 1904 was barely dry before others were arguing that intellectual ability could not be adequately captured by g or by any other unitary quantity—and understandably so, for common sense rebels against the idea that something so important about people as their intellects can be captured even roughly by variations in a single quantity. Many of the famous names in the history of psychometrics challenged the reality of g, starting with Galton’s most eminent early disciple, Karl Pearson, and continuing with many other creative and influential psychometricians.

In diverse ways, they sought the grail of a set of primary and mutually independent mental abilities. For Spearman, there was just one such primary ability, g. For Raymond Cattell, there are two kinds of g, crystallized and fluid,with crystallized g being general intelligence transformed into the skills of one’s own culture, and fluid g being the all-purpose intellectual capacity from which the crystallized skills are formed. In Louis Thurstone’s theory of intelligence, there are a half-dozen or so primary mental abilities, such as verbal, quantitative, spatial, and the like. In Philip Vernon’s theory, intellectual capacities are arranged in a hierarchy with g at its apex; in Joy Guilford’s, the structure of intellect is refined into 120 or more intellectual components. The theoretical alternatives to unitary, general intelligence have come in many sizes, shapes, and degrees of plausibility.

Many of these efforts proved to have lasting value. For example, Cattell’s distinction between fluid and crystallized intelligence remains a useful conceptual contrast, just as other work has done much to clarify what lies in the domain of specific abilities that g cannot account for. But no one has been able to devise a set of tests that do not reveal a large general factor of intellectual ability—in other words, something very like Spearman’s g. Furthermore, the classicists point out, the best standardized tests, such as a modern IQ test, do a reasonably good job of measuring g. When properly administered, the tests are not measurably biased against socioeconomic, ethnic, or racial subgroups. They predict a wide variety of socially important outcomes.

This is not the same as saying that the classicists are satisfied with their understanding of intelligence, g is a statistical entity, and current research is probing the underlying neurologic basis for it. Arthur Jensen, the archetypal classicist, has been active in this effort for the last decade, returning to Galton’s intuition that performance on elementary cognitive tasks, such as reaction time in recognizing simple patterns of lights and shapes, provides an entry point into understanding the physiology of g.

The Revisionists: Intelligence as Information Processing

A theory of intelligence need not be structural. The emphasis may be on process rather than on structure. In other words, it may try to figure out what a person is doing when exercising his or her intelligence, rather than what elements of intelligence are put together. The great Swiss psychologist, Jean Piaget, started his career in Alfred Binet’s laboratory trying to adapt Cyril Burt’s intelligence tests for Parisian children. Piaget discovered quickly that he was less interested in how well the children did than in what errors they made.35 Errors revealed what the underlying processes of thought must have been, Piaget believed. It was the processes of intelligence that fascinated him during his long and illustrious career, which led in time to his theory of the stages of cognitive development.

Starting in the 1960s, research on human cognition became the preoccupation of experimental psychologists, displacing the animal learning experiments of the earlier period. It was inevitable that the new experimentalists would turn to the study of human intelligence in natural settings. John B. Carroll and Earl B. Hunt led the way from the cognition laboratory to the study of human intelligence in everyday life. Today Yale psychologist Robert Sternberg is among the leaders of this development.

The revisionists share much with the classicists. They accept that a general mental ability much like Spearman’s g has to be incorporated into any theory of the structure of intelligence, although they would not agree that it accounts for as much of the intellectual variation among people as many classicists claim. They use many of the same statistical tools as the classicists and are prepared to subject their work to the same standards of rigor. Where they differ with the classicists, however, is their attitude toward intellectual structure and the tests used to measure it.

Yes, the revisionists argue, human intelligence has a structure, but is it worth investing all that effort in discovering what it is? The preoccupation with structure has engendered preoccupation with summary scores, the revisionists say. That, after all, is what an IQ score represents: a composite of scores that individually measure quite distinct intellectual processes. “Of course,” Sternberg writes, “a tester can always average over multiple scores. But are such averages revealing, or do they camouflage more than they reveal? If a person is a wonderful visualizer but can barely compose a sentence, and another person can write glowing prose but cannot begin to visualize the simplest spatial images, what do you really learn about these two people if they are reported to have the same IQ?”36

By focusing on processes, the revisionists argue, they are working richer veins than are those who search for static structure. What really counts about intelligence are the ways in which people process the information they receive. What problem-solving mechanisms do they employ? How do they trade off speed and accuracy? How do they combine different problem-solving resources into a strategy? Sternberg has fashioned his own thinking on this topic into what he calls a “triarchy of intelligence,” or “three aspects of human information processing,”37

The first part of Sternberg’s triarchy attempts to describe the internal architecture of intellectual functioning, the means by which humans translate sensory inputs into mental representations, allocate mental resources, infer conclusions from raw material, and acquire skills. This architectural component of Sternberg’s theory bears a family resemblance to the classicists’ view of the dimensions of intelligence, but it emphasizes process over structure.

The second part of the triarchic theory addresses the role of intelligence in routinizing performance, starting with completely novel tasks that test a person’s insightfulness, flexibility, and creativity, and eventually converting them to routine tasks that can be done without conscious thought. Understand this process, Sternberg argues, and we have leverage not just for measuring intelligence but for improving it.

The third part of Sternberg’s triarchy attacks the question that has been central to the controversy over intelligence tests: the relationship of intelligence to the real world in which people function. In Sternberg’s view, people function by means of three mechanisms: adaptation (roughly, trying to make the best of the situation), shaping the external environment so that it conforms more closely to the desired state of affairs, or selecting a new environment altogether. Sternberg laments the inadequacies of traditional intelligence tests in capturing this real-world aspect of intelligence and seeks to develop tests that will do so—and, in addition, lead to techniques for teaching people to raise their intelligence.

The Radicals: The Theory of Multiple Intelligences

Walter Lippmann’s hostility toward intelligence testing was grounded in his belief that this most important of all human qualities was too diverse, too complex, too changeable, too dependent on cultural context, and, above all, too subjective to be measured by answers to a mere list of test questions. Intelligence seemed to him, as it does to many other thoughtful people who are not themselves expert in testing, more like beauty or justice than height or weight. Before something can be measured, it must be defined, this argument goes.38 And the problems of definition for beauty, justice, or intelligence are insuperable. To people who hold these views, the claims of the intelligence testers seem naive at best and vicious at worst. These views, which are generally advanced primarily by nonspecialists, have found an influential spokesman from the academy, which is mainly why we include them here. We refer here to the theory of multiple intelligences formulated by Howard Gardner, a Harvard psychologist.

Gardner’s general definition of intelligent behavior does not seem radical at all. For Gardner, as for many other thinkers on intelligence, the notion of problem solving is central. “A human intellectual competence must entail a set of skills of problem solving,” he writes, “enabling the individual to resolve genuine problems or difficulties that he or she encounters and, when appropriate, to create an effective product—and also must entail the potential for finding or creating problems—thereby laying the groundwork for the acquisition of new knowledge.”39

Gardner’s view is radical (a word he uses himself to describe his theory) in that he rejects, virtually without qualification, the notion of a general intelligence factor, which is to say that he denies g. Instead, he argues the case for seven distinct intelligences: linguistic, musical, logical-mathematical, spatial, bodily-kinesthetic, and two forms of “personal intelligence,” the intrapersonal and the interpersonal, each based on its own unique computational capacity.40 Gardner rejects the criticism that he has merely redefined the word intelligence by broadening it to include what may more properly be called talents: “I place no particular premium on the word intelligence, but I do place great importance on the equivalence of various human faculties,” he writes. “If critics [of his theory] were willing to label language and logical thinking as talents as well, and to remove these from the pedestal they currently occupy, then I would be happy to speak of multiple talents.”41

Gardner’s approach is also radical in that he does not defend his theory with quantitative data. He draws on findings from anthropology to zoology in his narrative, but, in a field that has been intensely quantitative since its inception, Gardner’s work is uniquely devoid of psychometric or other quantitative evidence. He dismisses factor analysis: “[G]iven the same set of data, it is possible, using one set of factoranalytic procedures, to come up with a picture that supports the idea of a ‘g’ factor; using another equally valid method of statistical analysis, it is possible to support the notion of a family of relatively discrete mental abilities.”42 He is untroubled by the fact that tests of the varying intelligences in his theory seem to be intercorrelated: “I fear… that I cannot accept these correlations at face value. Nearly all current tests are so devised that they call principally upon linguistic and logical facility. … Accordingly, individuals with these skills are likely to do well even in tests of musical or spatial abilities, while those who are not especially facile linguistically and logically are likely to be impaled on such standardized tests.”43And in general, he invites his readers to disregard the thorny complexities of the classical and revisionist approaches: “When it comes to the interpretation of intelligence testing, we are faced with an issue of taste or preference rather than one on which scientific closure is likely to be reached.”44


Given these different ways of understanding intelligence, you will naturally ask where our sympathies lie and how they shape this book.

We will be drawing most heavily from the classical tradition. That body of scholarship represents an immense and rigorously analyzed body of knowledge. By accepted standards of what constitutes scientific evidence and scientific proof, the classical tradition has in our view given the world a treasure of information that has been largely ignored in trying to understand contemporary policy issues. Moreover, because our topic is the relationship of human abilities to public policy, we will be dealing in relationships that are based on aggregated data, which is where the classical tradition has the most to offer. Perhaps an example will illustrate what we mean.

Suppose that the question at issue regards individuals: “Given two 11 year olds, one with an IQ of 110 and one with an IQ of 90, what can you tell us about the differences between those two children?” The answer must be phrased very tentatively. On many important topics, the answer must be, “We can tell you nothing with any confidence.” It is well worth a guidance counselor’s time to know what these individual scores are, but only in combination with a variety of other information about the child’s personality, talents, and background. The individual’s IQ score all by itself is a useful tool but a limited one.

Suppose instead that the question at issue is: “Given two sixth-grade classes, one for which the average IQ is 110 and the other for which it is 90, what can you tell us about the difference between those two classes and their average prospects for the future?” Now there is a great deal to be said, and it can be said with considerable confidence—not about any one person in either class but about average outcomes that are important to the school, educational policy in general, and society writ large. The data accumulated under the classical tradition are extremely rich in this regard, as will become evident in subsequent chapters.

If instead we were more concerned with the development of cognitive processes than with aggregate social and economic outcomes, we would correspondingly spend more time discussing the work of the revisionists. That we do not reflects our focus, not a dismissal of their work.

With regard to the radicals and the theory of multiple intelligences, we share some common ground. Socially significant individual differences include a wide range of human talents that do not fit within the classical conception of intelligence. For certain spheres of life, they matter profoundly. And even beyond intelligence and talents, people vary temperamentally, in personality, style, and character. But we confess to reservations about using the word intelligence to describe such factors as musical abilities, kinesthetic abilities, or personal skills. It is easy to understand how intelligence (ordinarily understood) is part of some aspects of each of those human qualities—obviously, Bach was engaging in intelligent activity, and so was Ted Williams, and so is a good used-car salesman—but the part intelligence plays in these activities is captured fairly well by intelligence as the classicists and revisionists conceive of it. In the case of music and kinesthetics, talent is a word with a domain and weight of its own, and we are unclear why we gain anything by discarding it in favor of another word, intelligence, that has had another domain and weight. In the case of intrapersonal and interpersonal skills, conventional intelligence may play some role, and, to the extent that other human qualities matter, words like sensitivity, charm, persuasiveness, insight—the list could go on and on—have accumulated over the centuries to describe them. We lose precision by using the word intelligence to cover them all. Similarly, the effect that an artist or an athlete or a salesman creates is complex, with some aspects that may be dominated by specific endowments or capacities, others that may be the product of learned technique, others that may be linked to desires and drives, and still others that are characteristic of the kind of cognitive ability denoted by intelligence. Why try to make intelligence do triple or quadruple duty?

We agree emphatically with Howard Gardner, however, that the concept of intelligence has taken on a much higher place in the pantheon of human virtues than it deserves. One of the most insidious but also widespread errors regarding IQ, especially among people who have high IQs, is the assumption that another person’s intelligence can be inferred from casual interactions. Many people conclude that if they see someone who is sensitive, humorous, and talks fluently, the person must surely have an above-average IQ.

This identification of IQ with attractive human qualities in general is unfortunate and wrong. Statistically, there is often a modest correlation with such qualities. But modest correlations are of little use in sizing up other individuals one by one. For example, a person can have a terrific sense of humor without giving you a clue about where he is within thirty points on the IQ scale. Or a plumber with a measured IQ of 100—only an average IQ—can know a great deal about the functioning of plumbing systems. He may be able to diagnose problems, discuss them articulately, make shrewd decisions about how to fix them, and, while he is working, make some pithy remarks about the president’s recent speech.

At the same time, high intelligence has earmarks that correspond to a first approximation to the commonly understood meaning of smart. In our experience, people do not use smart to mean (necessarily) that a person is prudent or knowledgeable but rather to refer to qualities of mental quickness and complexity that do in fact show up in high test scores. To return to our examples: Many witty people do not have unusually high test scores, but someone who regularly tosses off impromptu complex puns probably does (which does not necessarily mean that such puns are very funny, we hasten to add). If the plumber runs into a problem he has never seen before and diagnoses its source through inferences from what he does know, he probably has an IQ of more than 100 after all. In this, language tends to reflect real differences: In everyday language, people who are called very smart tend to have high IQs.

All of this is another way of making a point so important that we will italicize it now and repeat elsewhere: Measures of intelligence have reliable statistical relationships with important social phenomena, but they are a limited tool for deciding what to make of any given individual. Repeat it we must, for one of the problems of writing about intelligence is how to remind readers often enough how little an IQ score tells about whether the human being next to you is someone whom you will admire or cherish. This thing we know as IQ is important but not a synonym for human excellence.

Idiot Savants and Other Anomalies

To add one final complication, it is also known that some people with low measured IQ occasionally engage in highly developed, complex cognitive tasks. So-called idiot savants can (for example) tell you on what day Easter occurred in any of the past or future two thousand years.45 There are also many less exotic examples. For example, a study of successful track bettors revealed that some of them who used extremely complicated betting systems had below-average IQs and that IQ was not correlated with success.46 The trick in interpreting such results is to keep separate two questions: (1) If one selects people who have already demonstrated an obsession and success with racetrack betting systems, will one find a relationship with IQ (the topic of the study in question)? versus (2) if one selects a thousand people at random and asks them to develop racetrack betting systems, will there be a relationship with IQ (in broad terms, the topic of this book)?

Howard Gardner has also convinced us that the word intelligence carries with it undue affect and political baggage. It is still a useful word, but we shall subsequently employ the more neutral term cognitive ability as often as possible to refer to the concept that we have hitherto called intelligence, just as we will use IQ as a generic synonym for intelligence test score. Since cognitive ability is an uneuphonious phrase, we lapse often so as to make the text readable. But at least we hope that it will help you think of intelligence as just a noun, not an accolade.

We have said that we will be drawing most heavily on data from the classical tradition. That implies that we also accept certain conclusions undergirding that tradition. To draw the strands of our perspective together and to set the stage for the rest of the book, let us set them down explicitly. Here are six conclusions regarding tests of cognitive ability, drawn from the classical tradition, that are by now beyond significant technical dispute:

1. There is such a thing as a general factor of cognitive ability on which human beings differ.

2. All standardized tests of academic aptitude or achievement measure this general factor to some degree, but IQ tests expressly designed for that purpose measure it most accurately.

3. IQ scores match, to a first degree, whatever it is that people mean when they use the word intelligent or smart in ordinary language.

4. IQ scores are stable, although not perfectly so, over much of a person’s life.

5. Properly administered IQ tests are not demonstrably biased against social, economic, ethnic, or racial groups.

6. Cognitive ability is substantially heritable, apparently no less than 40 percent and no more than 80 percent.

All six points have an inverse worth noting. For example, some people’s scores change a lot; cognitive ability is not synonymous with test scores or with a single general mental factor, and so on. When we say that all are “beyond significant technical dispute,” we mean, in effect, that if you gathered the top experts on testing and cognitive ability, drawn from all points of view, to argue over these points, away from television cameras and reporters, it would quickly become apparent that a consensus already exists on all of the points, in some cases amounting to near unanimity. And although dispute would ensue about some of the points, one side—the side represented by the way the points are stated—would have a clear preponderance of evidence favoring it, and those of another viewpoint would be forced to lean heavily on isolated studies showing anomalous results.

This does not mean that the experts should leave the room with their differences resolved. All six points can be accurate as general rules and still leave room for differences in the theoretical and practical conclusions that people of different values and perspectives draw from them (and from the mass of material about cognitive ability and testing not incorporated in the six points). Radicals in the Gardner mold might still balk at all the attention being paid to intelligence as the tests measure it. But these points, in themselves, are squarely in the middle of the scientific road.

Having said this, however, we are left with a dilemma. The received wisdom in the media is roughly 180 degrees opposite from each of the six points. To prove our case, taking each point and amassing a full account of the evidence for and against, would lead us to write a book just about them. Such books have already been written. There is no point in our trying to duplicate them.47

We have taken two steps to help you form your own judgments within the limits of this book. First, we deal with specific issues involving the six points as they arise in the natural course of the discussion—cultural bias when discussing differences in scores across ethnic groups, for example. Second, we try to provide a level of detail that will satisfy different levels of technical curiosity through the use of boxed material (you have already come across some examples), notes, and appendixes. Because we expect (and fear) that many readers will go directly to chapters that especially interest them rather than read the book from cover to cover, we also insert periodic reminders about where discussion of certain key topics may be found.