Notes - The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray

The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray (1996)




National Center for Education Statistics, Digest of Education Statistics. Published annually, Washington, D.C.: Government Printing Office.


National Longitudinal Survey of Youth. Center for Human Resource Research, Ohio State University, Columbus, Ohio.


U.S. Bureau of the Census. Statistical Abstract of the United States. Published annually, Washington, D.C.: Government Printing Office. For each cite in the text, we have added the year of theedition and table numbers to the abbreviation; e.g., DES, 19xx, Table xx”


1 Galton 1869.

2 Forrest 1974.

3 For a brief history of testing from Galton on, see Herrnstein and Boring 1965.

4 In Introina, civil service examinations that functioned de facto as intelligence tests—though overweighted with pure memory questions—had been in use for more than a thousand years.

5 Spearman 1904.

6 Galton 1888; Stigler 1986.

7 A correlation matrix is the set of all pairs of correlations. For example, in a 20-item test, each item will have 19 unique correlations with the other items, and the total matrix will contain 190 unique correlations (of Item 1 with Item 2, Item 1 with Item 3, etc.).

8 We are glossing over many complexities, including the effects of varying reliabilities for the items or tests. Spearman understood, and took account of, the contribution of reliability variations.

9 Buck v. Bell, 1927.

10 This was Harry Laughlin, whose story is told in Kevles 1985.

11 Brigham 1923; Kevles 1985.

12 The stories have been most influentially told by Fallows 1980; Gould 1981; Kamin 1974.

13 Snyderman and Herrnstein 1983.

14 Snyderman and Herrnstein 1983.

15 Lippmann 1922 p. 10.

16 Lippmann, 1923 p. 46.

17 Snyderman and Herrnstein 1983.

18 Maier and Schneirla 1935.

19 Skinner 1938.

20 Skinner 1953; Skinner 1971.

21 Jensen 1969.

22 Hirsch 1975, p. 3.

23 Pearson 1992.

24 Herrnstein 1971.

25 Griggs et al. v. Duke Power Co., 1971.

26 Quoted in Jensen 1980, p. 13.

27 Elliott 1987.

28 Kamin 1974, p. 3.

29 O. Gillie. 1976. Crucial data faked by eminent psychologist. Sunday Times (London), Oct. 24, pp. 1-2.

30 Joynson 1989; Fletcher 1991.

31 Bouchard et al. 1990.

32 Gould 1981.

33 Gould 1981, pp. 27-28.

34 Snyderman and Rothman 1988.

35 Binet himself had died by the time Piaget arrived at the Sorbonne in 1919, but the work on intelligence testing was being carried forward by his collaborator on the first Binet test, Thèophile Simon (see Piaget 1952).

36 Sternberg 1988, p. 8.

37 Sternberg 1985, p. 18.

38 Block and Dworkin 1974.

39 Gardner 1983, pp. 60-61. Emphasis in the original.

40 Gardner 1983, p. 278.

41 Gardner 1983, p. xi. Emphasis in original.

42 Gardner 1983, p. 17. In fact, Gardner’s claim about the arbitrariness of factor analysis is incorrect.

43 Gardner 1983, pp. xi-xii.

44 Gardner 1983, p. 17.

45 Although some of the accomplishments of mental calculators remain inexplicable, much has been learned about how they are done. See Jensen 1990; O’Connor and Hermelin 1987.

46 Ceci and Liker 1986.

47 An accurate and highly readable summary of the major points is Seligman 1992. For those who are prepared to dig deeper, Jensen 1980 remains an authoritative statement on most of the basic issues despite the passage of time since it was published.

Introduction to Part I

1 Reuning 1988.

2 Robert Laird Collier, quoted in Manchester 1983, p. 79.

Chapter 1

1 Bender 1960, p. 2.

2 The national SAT-V in 1952 was 476, a little more than a standard deviation lower than the Harvard mean. Perhaps the average Harvard student was much farther ahead of the national average than the text suggests because the national SAT-taking population was so selective, representing only 6.8 percent of high school graduates. But one of the oddities of the 1950s, discussed in more detail in Chapter 18, is that the SAT means remained constant through the decade and into 1963, even as the size of the test-taking population mushroomed. By 1963, when SAT scores hit their all-time high in the post-1952 period, the test-taking population had grown to 47.9 percent of all high school graduates. Thus there is reason to think that the comparison is about the same as the one that would have been produced by a much larger number of test takers in 1952.

3 Bender 1960, p. 4.

4 In the 1920s, fewer than 30 percent of all young people graduated from high school, and the differences between the cognitive ability of graduates and nongraduates were small, as discussed in Chapter 6. Something between 60 and 75 percent of the 18-year-olds in the top IQ quartile never even made it into the calculations shown in the figure on page 34. From the early 1960s on, 70 percent of the nation’s youth have graduated from high school, and we know that the difference between the ability of those who do and do not graduate has been large. More concretely, of a nationally representative sample of youth who were administered a highly regarded psychometric test in 1980 when they were 15 and 16 years old, 95 percent of those who scored in the top quartile subsequently graduated from high school, and another 4 percent eventually got a general equivalency diploma. The test was the Armed Forces Qualification Test, and the sample was the 1964 birth cohort of the National Longitudinal Survey of Youth (NLSY), discussed in detail in the introduction to Part II. The figure for the proportion entering colleges is based on the NLSY cohorts and students entering colleges over 1981-1983.

5 The top IQ quartile of the NLSY that first attended college in 1981-1983 was split as follows: 21 percent did not continue to college in the first year after graduation, 18 percent went to a two-year college, and 61 percent attended a four-year college.

6 O’Brien 1928. These percentages are based on high school graduates, which accounts for the high percentages of students shown as going to college in the 1920s. If the estimates had been based on the proportion of the 18-year-olds who have been graduating from high school since the 1970s, those proportions would have been much smaller. The shape of the curve, however, would be essentially unchanged (because the IQ distribution of students who did not complete high school was so close to the distribution of those who did; see Finch 1946).

7 Another excellent database from the same period, a nationally representative sample tested with the Preliminary SAT in 1960 and followed up a year later, confirms results from Project TALENT, a large, nationally representative sample of high school youths taken in 1960 (Seibel 1962). Among those who scored in the bottom quartile, for example, only 11 percent went to college; of those in the top quartile, 79 percent went to college; of those in the top 5 percent, more than 95 percent went to college.

8 These data are taken from Project TALENT in 1960.

9 From the NLSY, described in the introduction to Part II.

10 The test was Form A of the Otis. Brigham 1932, Table XVIII, p. 336.

11 The schools are Brown, Bryn Mawr, Columbia, Harvard, Mount Holyoke, Princeton, Radcliffe, Smith, University of Pennsylvania (with separate means for men and women), Vassar, Wellesley, Williams, and Yale.

12 Learned and Wood 1938.

13 Not including the University of Pennsylvania, one of the elite schools.

14 Between the earliest SAT and 1964, the SAT had divided into a verbal and a math score. It is a moot question whether the modern overall SAT or the verbal SAT is more comparable to the original SAT. In the comparisons being made here, we rely on the Educational Testing Service norm studies, which enable us to place an SAT value on the national 18-year-old cohort, not just the cohort who takes the test. We explain the norm studies in Chapter 18.

15 This is not the usual SAT distribution, which is ordinarily restricted to college-bound seniors, but rather shows the distribution for a nationally representative sample of all high school seniors, based on the norm studies mentioned in note 14. It is restricted to persons still in high school and does not include the 34 percent of 18-year-olds who were not.

16 We know how high the scores were for many schools as of the early 1960s. We know Harvard’s scores in the early 1950s. We can further be confident that no school was much more selective than Harvard as of 1952 (with the possible exception of science students going to Cal Tech and MIT). Therefore means for virtually all of the other schols as of 1952 had to be near or below Harvard’s, and the dramatic changes for the other elite schools had to be occurring in the same comparatively brief period of time concentrated in the 1950s.

17 Bender 1960, p. 6.

18 This percentage is derived from 1960 data reported by Bender 1960, p. 15, regarding the median family income of candidates who applied for scholarship aid, were denied, but came to Harvard anyway. Total costs at Harvard in 1960 represented 21 percent of that median.

19 The families for whom a year at Harvard represented less than 20 percent of their income constituted approximately 5.8 percent of families in 1950 and 5.5 percent of families in 1960. Estimated from U.S. Bureau of the Census 1975, G-1-15.

20 The faculty’s views were expressed in Faculty of Arts and Sciences 1960.

21 Bender 1960, p. 31.

22 For an analysis of the ascriptive qualities that Harvard continued to use for admissions choices in the 1980s, see Karen 1991.

23 The increase in applications to Harvard had been just as rapid from 1952 to 1958, when the size of the birth cohorts was virtually constant, as in 1959 and 1960, when they started to increase.

24 For an analysis of forces driving more recent increases in applications, see Clotfelter 1990 and Cook and Frank 1992.

25 Cook and Frank 1992.

26 Harvard, MIT, Princeton, Stanford, and Cal Tech were in the top seven in all three decades. Columbia and Chicago were the other two in the 1960s, Yale and Cornell in the 1970s and 1980s. Cook and Frank 1992, Table 3.

27 Cook and Frank 1992, Table 4. The list of “most competitive” consists of the thirty-three schools named by Barron’s in its 1980 list. The Cook and Frank analysis generally suggests that the concentration of top students in a few schools may have plateaued during the 1970s, then resumed again in the 1980s.

28 U.S. News & World Report, October 15, 1990, pp. 116-134. It is not necessary to insist that this ranking is precisely accurate. It is enough that it includes all the schools that most people would name if they were asked to list the nation’s top schools, and the method for arriving at the list of fifty seems reasonable.

29 The College Board ethnic and race breakdowns for 1991, available by request from the College Board. There is also reason to believe that an extremely high proportion of high school students in each senior class who have the potential to score in the high 600s and the 700s on the SAT actually take the test. See Murray and Herrnstein 1992.

30 See Chapter 18 for where the SAT population resides in the national context.

31 These represent normal distributions based on estimates drawn from the Learned data that the mean IQ of Pennsylvania graduates in 1930 was approximately two-thirds of a standard deviation above the mean (the mean of incoming freshmen was .48 SDs above the mean), and from the Brigham data that the graduates of the Ivy League and Seven Sisters were approximately 1.25 SDs above the mean (they were 1.1 SDs above the mean as freshmen, and the Ivy League graduated extremely high proportions of the incoming students).

32 The distributions for the main groups are based on the NLSY, for youths who came of college age from 1981 to 1983 and have been followed through the 1990 interview wave. The top dozen universities are those ranked 1 through 12 in the U.S. News & World Report survey for 1990. U.S. News & World Report, October 15, 1990, pp. 116-134. The analysis is based on published distribution of SAT-Verbal scores, which is the more highly g-loaded of the SAT subtests. The estimated verbal mean (weighted by size of the freshman class) for these twenty schools, based on their published SAT distributions, is 633. The estimated mean for graduates is 650 (dropout rates for these schools are comparatively low but highly concentrated among those with the lowest entering scores). This compares with a national SAT-Verbal norm estimated at 376 with an SD of 102 (Braun, Centra, and King, 1987, Appendix B). The distribution in the figure on page 46 converts the SAT data to standardized scores. The implicit assumption is that AFQT (Armed Forces Qualification Test, an intelligence test discussed in Appendix 3) and SAT-Verbal measure the same thing, which is surely wrong to some degree. Both tests are highly g-loaded, however, and it is reasonable to conclude that youths who have a mean 2.5 SDs above the mean on the SAT would have means somewhere close to that on a full-fledged mental test.

33 We have defined these as the first twelve of the listed universities in the U.S. News & World Report listing for 1990. They are (in the order of their ranking) Harvard, Stanford, Yale, Princeton, Cal Tech, MIT, Duke, Dartmouth, Cornell, Columbia, University of Chicago, and Brown.

34 The probabilities are based on the proportions of people entering these categories in the 1980s, which means that they become progressively too generous for older readers (when the proportion of people getting college degrees was smaller). But this is a technicality; the odds are already so tiny that they are for practical purposes unaffected by further restrictions. The figure for college degrees reflects the final educational attainment of members of the NLSY, who were born in 1957 through 1964, as of 1990 (when the youngest was 25), as a weighted proportion of the NLSY population. The figure for Ph.D., law, and medical degrees is based on the number of degrees awarded over 1980-1989 expressed as a proportion of the population age 26 in each of those years. The figure for graduates of the dozen elite schools is based on the number of undergraduate degrees awarded by these institutions in 1989 (the figure has varied little for many years), expressed as a proportion of the population age 22 in 1989 (incidentally, the smallest cohort since the mid-1970s.)

35 Based on the median percentages for those score intervals among those schools.

Chapter 2

1 Herrnstein 1973.

2 For a one-source discussion of IQs and occupations, see Matarazzo 1972, chap. 7. Also see Jencks et al. 1972 and Sewell and Hauser 1975 for comprehensive analyses of particular sets of data. The literature is large and extends back to the early part of the century. For earlier studies, see, for example, Bingham 1937; Clark and Gist 1938; Fryer 1922; Pond 1933; Stewart 1947; Terman 1942. For more recent estimates of minimum scores for a wide variety of occupations, see E. F Wonderlic & Associates 1983; U.S. Department of Labor 1970.

3 Jencks et al. 1972.

4 Fallows 1985.

5 The Fels Longitudinal Study; see McCall 1977.

6 The correlation was a sizable .5-.6, on a scale that goes from −1 to +1. See Chapter 3 and Appendix 1 for a fuller explanation of what the correlation coefficient means. Job status for the boys was about equally well predicted by childhood IQ as by their completed educational levels; for the girls, job status was more correlated with childhood IQ than with educational attainment. In another study, adult intelligence was also more, highly correlated with occupational status than with educational attainment (see Duncan 1968). But this may make a somewhat different point, inasmuch as adult intelligence may itself be affected by educational attainment, in contrast to the IQ one chalks up at age 7 or 8 years. In yet another study, based on Swedish data, adult income (as distinguished from occupational status) was less strongly dependent on childhood IQ (age 10) than on eventual educational attainment (T. Husén’s data presented in Griliches 1970), although being strongly dependent on both. Other analyses come up with different assessments of the underlying relationships (e.g., Bowles and Gintis 1976; Jencks 1979). Not surprisingly, the empirical picture, being extremely diverse and rich, has lent itself to myriad formal analyses, which we will make no attempt to review. In Chapters 3 and 4, we present our interpretation of the link between individual ability and occupation. We also discuss some of the evident exceptions to these findings.

7 Many of the major studies (e.g., Duncan 1968; Jencks et al. 1972; McCall 1977; Sewell and Hauser 1975) include variables describing familial socioeconomic status, which prove to be somewhat predictive of a person’s own status.

8 For a fuller discussion of both the explanation and the controversy, see Herrnstein 1973.

9 Teasdale, Sorenson, and Owen 1984.

10 The authors of the study offered as an explanation for this pattern of results the well-established pattern of resemblances among relatives in IQ, presumably owing to the genes that natural siblings share and that adoptive siblings do not share. It could, of course, be traits of personality rather than of intellect that tie a family’s occupational destinies together. However, the small body of evidence bearing on personality traits finds them to be distinctly weaker predictors of job status than is IQ. Another study, of over 1,000 pairs of Norwegian twins, supported the conclusion that the resemblance in job status among close relatives is largely explained by their similarity in IQ and that genes play a significant role in this similarity. See Tambs et al. 1989.

11 For some of the most detailed distributional data, see Stewart 1947, Table 1.

12 Matarazzo 1972, p. 177.

13 Specific cognitive strengths also vary by occupation, with engineers tending to score higher on analytic and quantitative sections of the Graduate Record Exams, while English professors do better on the verbal portions (e.g., Wah and Robinson 1990, Figure 2.2).

14 With a mean of 100 and SD of 15, an IQ score of 120 cuts off the 91st percentile of a normal distribution. But the IQ distribution tends to be skewed so that it is fat on the right tail. To say that 120 cuts off the top tenth is only approximate but close enough for our purposes.

15 The procedure we used to create the figure on page 56 yielded an estimate of 23.2 percent of the top IQ decile in high-IQ occupations in 1990. Of the top IQ decile in the NLSY as of 1990, when they ranged in age from 25 to 32, 22.2 percent of the top decile were employed in the dozen high-IQ occupations. The analysis excludes those who were still enrolled in school in 1990 and those who were in the military (because their occupation within the military was unknown). The NLSY figure is an underestimate (compared to the national estimate) in that those who are still students will disproportionately enter high-IQ professions. On the other hand, the NLSY would be likely to exceed the national data in the figure insofar as the entire NLSY age cohort is of working age, without retirees. One other comment on possible distortions over time: It might be hypothesized that, since 1900, the mean has dropped and distribution has spread, as more and more people have entered those professions. The plausibility of the hypothesis is arguable; indeed, there are reasons for hypothesizing that the opposite has occurred (for the same reasons educational stratification has raised the IQ of students at the elite colleges). But it would not materially affect the plot in the figure on page 56 even if true, because the numbers of people in those professions were so small in the early decades of the century. It may also be noted that in the NLSY data, 46 percent of all job slots in the high-IQ occupations were held by people in the top decile, again matching our conjecture about the IQ scores within the occupations.

16 Terman and Oden 1947.

17 The NLSY cannot answer that question, because even a sample of 11,878 (the number that took the AFQT) is too small to yield adequate sample sizes for analyzing subgroups in the top tenth of the top percentile.

18 There are not that many people with IQs of 120+ left over, after the known concentrations of them in the high IQ occupations are taken into account.

19 The literature is extensive. The studies used for this discussion, in addition to those cited specifically, include Bendix 1949; Macmahon and Millett 1939; Pierson 1969; Stanley, Mann, and Doig 1967; Sturdivant and Adler 1976; Vanee 1966; Warner and Abegglen 1955.

20 Newcomer 1955, Table 24, p. 68.

21 Clews 1908, pp. 27, 37, quoted in Newcomer 1955, p. 66.

22 The data are drawn from Newcomer 1955.

23 Burck 1976. The Fortune survey was designed to yield data comparable with those in Newcomer 1955.

24 The ostensible decline in college degrees after 1950 is explained by college graduates’ going on to get additional educational credentials. For another study of educational attainment of CEOs that shows the same pattern, see Priest 1982.

25 U.S. Bureau of the Census 1992, Tables 18, 615, and U.S. Department of Labor 1991, Table 22.

26 Excluding accountants, who were already counted in the high-IQ professions.

27 Matarazzo 1972, Table 7.3, p. 178.

Chapter 3

1 Bok 1985b. In another setting, again discussing the SAT, he wrote, “Such tests are only modestly correlated with subsequent academic success and give no reliable indication of achievement in later life” (Bok 1985a, p. 15).

2 The correlation of IQ with income in a restricted population such as Harvard graduates could be negative when people toward the top of the IQ distribution are disproportionately drawn into academia, where they make a decent living but seldom much more than that, while students with IQs of “only” 120 and 130 will more often go into the business world, where they may get rich.

3 See Chapter 19; Dunnette 1976; Ghiselli 1973.

4 Technically, a correlation coefficient is a ratio, with the covariation of the two variables in the numerator and the product of the separate standard deviations of the two variables in the denominator. The formula for computing a Pearson product moment correlation r (the kind that we will be using throughout) is: Imag where X and Y refer to the actual values for each case and X and Y refer to the mean values of the X and Y, respectively.

5 We limited the sample to families making less than $100,000, so as to avoid some distracting technical issues that arise when analyzing income across the entire spectrum (e.g., the appropriateness of using logged values rather than raw values). The results from the 1 percent sample are in line with the statistics produced when the analysis is repeated for the entire national sample: a correlation of .31 and an increment of $2,700 per year of additional education. Income data are for 1989, expressed in 1990 dollars.

6 An important distinction: The underlying relationship persists in a sample with restricted range, but the restriction of range makes the relationship harder to identify (i.e., the correlation coefficient is attenuated, sometimes to near zero).

Forgetting about restriction of range produces fallacious reasoning that is remarkably common, even among academics who are presumably familiar with the problem. For example, psychologist David McClelland, writing at the height of the anti-IQ era in 1973, argued against any relationship between career success and IQ, pointing out that whereas college graduates got better jobs than nongraduates, the academic records of graduates did not correlate with job success, even though college grades correlate with IQ. He added, anecdotally, that he recalled his own college class—Wesleyan University, a top-rated small college—and was convinced that the eight best and eight worst students in his class had not done much differently in their subsequent careers (McClelland 1973). This kind of argument is also common in everyday life, as in the advice offered by friends during the course of writing this book. There was, for example, our friend the nuclear physicist, who prefaced his remarks by saying, “I don’t think I’m any smarter than the average nuclear physicist …” Or an engineer friend, a key figure in the Apollo lunar landing program, who insisted that this IQ business is much overemphasized. He had been a C student in college and would not have even graduated, except that he managed to pull himself together in his senior year. His conclusion was that motivation was important, not IQ. Did he happen to know what his IQ was? Sure, he replied. It was 146. He was right, insofar as motivation can make the difference between being a first-rate rocket scientist and a mediocre one—if you start with an IQ of 146. But the population with a score of 146 (or above) represents something less than 0.2 percent of the population. Similarly, correlations of IQ and job success among college graduates suffer from restriction of range. The more selective the group is, the greater the restriction, which is why Derek Bok may plausibly (if not quite accurately) have claimed that SAT scores have “no correlation at all with what you do in the rest of your life” if he was talking about Harvard students.

7 E.g., Fallows 1985.

8 See Chapter 20 for more detail.

9 Griggs v. Duke Power, 401 U.S. 424 (1971).

10 The doctrine has been built into the U.S. Employment and Training Service’s General Aptitude Test Battery (GATB), into the federal civil service’s Professional and Administrative Career Examination (PACE), and into the military’s Armed Services Vocational Aptitude Battery (ASVAB). Bartholet 1982; Braun 1992; Gifford 1989; Kelman 1991; Seymour 1988. For a survey of test instruments and their use, see Friedman and Williams 1982.

11 For a recent review of the expert community as a whole, see Schmidt and Ones 1992.

12 Hartigan and Wigdor 1989 and Schmidt and Hunter 1991 represent the two ends of the range of expert opinion.

13 For a sampling of the new methods, see Bangert-Drowns 1986; Glass 1976; Glass, McGaw, and Smith 1981; Hunter and Schmidt 1990. Meta-analytic strategies had been tried for decades prior to the 1970s, but it was after the advent of powerful computers and statistical software that many of the techniques became practicable.

14 Hartigan and Wigdor 1989; Hunter and Schmidt 1990; Schmidt and Hunter 1981.

15 We have used the terms job productivity or job performance or performance ratings without explaining what they mean or how they are measured. On the other hand, all of us have a sense of what job productivity is like—we are confident that we know who are the better and worse secretaries, managers, and colleagues among those with whom we work closely. But how is this knowledge to be captured in objective measures? Ratings by supervisors or peers? Samples of work in the various tasks that a job demands? Tests of job knowledge? Job tenure or promotion? Direct cost accounting of workers’ output? There is no way to answer such a question decisively, for people may legitimately disagree about what it is about a worker’s performance that is most worth predicting. As a practical matter, ratings by supervisors, being the most readily obtained and the least intrusive in the workplace, have dominated the literature (Hunter 1986). But it is natural to wonder whether supervisor ratings, besides being easy to get, truly measure how well workers perform rather than, say, how they get along with the boss or how they look (Guion 1983).

To get a better fix on what the various measures of performance mean, it is useful to evaluate a number of studies that have included measures of cognitive ability, supervisor ratings, samples of work, and tests of job knowledge. Work samples are usually obtained by setting up stations for workers to do the various tasks required by their jobs and having their work evaluated in some reasonably objective way. Different occupations lend themselves more or less plausibly to this kind of simulated performance. The same is true of written or oral tests of job knowledge.

One of the field’s leaders, John Hunter, has examined the correlational structure that relates these different ways of looking at job performance to each other and to an intelligence test score (Hunter 1983, 1986). In a study of 1,800 workers, Hunter found a strong direct link between intelligence and job knowledge and a much smaller direct one between intelligence and performance in work sample tasks. By direct we mean that the variables predict each other without taking any other variable into account. The small direct link between intelligence and work sample was augmented by a large indirect link, via job knowledge: a person’s intelligence predicted his knowledge of the job, and his knowledge in turn predicted his work sample. The correlation (after the usual statistical corrections) between intelligence and job knowledge was .8; between intelligence and work sample it was .75. The indirect link between intelligence and work sample, via job knowledge, was larger by half than the direct one (Hunter 1986).

The correlation between intelligence and supervisor ratings in Hunter’s analysis was .47. Upon analysis, Hunter found that the primary reason is that brighter workers know more about their jobs, and supervisors respond favorably to their knowledge. A comparable analysis of approximately 1,500 military personnel in four specialties produced the same basic finding (Hunter 1986). This may seem a weakness of the supervisor rating measure, but is it really? How much workers know about their jobs correlates, on the one hand, with their intelligence and, on the other, with both how they do on direct tests of their work and how they are rated by their supervisors. A worker’s intelligence influences how much he learns about the job, and job knowledge contributes to proficiency. The knowledge also influences the impression the worker makes on a supervisor rating more than the work as measured by a work sample test (which, of course, the supervisor may never see in the ordinary course of business). Using supervisor rating as a measure of proficiency is thereby justified, without having to claim that the rating directly measures proficiency.

Hunter found that work samples are more dependent on intelligence and job knowledge than are supervisor ratings. Supervisor ratings, which are so predominant in this literature, may, in other words, underestimate how important intelligence is for proficiency. Recent research suggests that supervisor ratings in fact do underestimate the correlation between intelligence and productivity (Becker and Huselid 1992). But we should acknowledge again that none of the measures of proficiency—work samples, supervisor ratings, or job knowledge tests—is free of the taint of artificiality, let alone arbitrariness. Supervisor ratings may be biased in many ways; a test of job knowledge is a test, not a job; and even a worker going from one work station to another under the watchful eye of an industrial psychologist may be revealing something other than everyday competence. It has been suggested that the various contrived measures of workers tell us more about maximum performance than they do about typical, day-to-day proficiency (Guion 1983). We therefore advise that the quantitative estimates we present here (or that can be found in the technical literature at large) be considered only tentative and suggestive.

16 The average validity of .4 is obtained after standard statistical corrections of various sorts. The two most important of these are a correction for test unreliability or measurement error and a correction for restriction of range among the workers in any occupation. All of the validities in this section of the chapter are similarly corrected, unless otherwise noted.

17 Ghiselli 1966, 1973; Hunter and Hunter 1984, Table 1.

18 Hunter 1980; Hunter and Hunter 1984.

19 Where available, ratings by peers, tests of job knowledge, and actual work samples often come close to ability measures as predictors of job performance (Hunter and Hunter 1984). But aptitude tests have the practical advantage that they can be administered relatively inexpensively to large numbers of applicants, and they do not depend on applicants’ having been on the job for any length of time.

20 E. F. Wonderlic & Associates 1983; Hunter 1989. These validities, which are even higher than the ones presented in the table on page 74 are for training success rather than for measures of job performance and are more directly comparable with the column for training success in the GATB studies than the column for job proficiency. Regarding job performance, one major study evaluated the performance of about 1,500 air force enlisted men and women working in eight military specialties, chosen to be representative of military specialties in the air force. Performance was variously measured: by defining a set of tasks involved in each job, then training a group of evaluators to assess those specific tasks; by interviews of the personnel on technical aspects of their jobs; by supervisor ratings after training the supervisors; and combinations of methods. The average correlation between AFQT score and a hands-on job performance measure was .40, with the highest among the precision measurement equipment specialists and the avionics communications specialists and the lowest among the air traffic control operators and the air crew life support specialists. Insofar as the jobs were restricted to those held by enlisted men, the distribution of jobs was somewhat skewed toward the lower end of the skill range. We do not have an available estimate of the validity of the AFQT over all military jobs.

21 Hartigan and Wigdor 1989.

22 It is one of the chronically frustrating experiences when reading scientific results: Two sets of experts, supposedly using comparable data, come out with markedly different conclusions, and the reasons for the differences are buried in technical and opaque language. How is it possible for a layperson to decide who is right? The different estimates of mean validity of the GATB—.45 according to Hunter, Schmidt, and some others; .25 according to the Hartigan committee—is an instructive case in point.

Sometimes the differences really are technical and opaque. For example, the Hartigan committee based its estimate on the assumption that the reliability of supervisor ratings was higher than other studies assumed—.8 instead of .6 (Hartigan and Wigdor 1989, p. 170). By assuming a higher reliability, the committee’s correction for measurement error was smaller than Hunter’s. Deciding between the Hartigan committee’s use of .8 as the reliability of supervisor ratings instead of the .6 used by Hunter is impossible for anyone who is not intimately familiar with a large and scattered literature on that topic, and even then the choice remains a matter of judgment. But the Hartigan committee’s decision not to correct for restriction of range, which makes the largest difference in their estimates of the overall validity, is based on a much different kind of disagreement. Here, a layperson is as qualified to decide as an expert, for this is a disagreement about what question is being answered.

John Hunter and others assumed that for any job the applicant pool is the entire U.S. work force. That is, they sought an answer to the question, “What is the relationship between job performance and intelligence for the work force at large?” The Hartigan committee objected to their assumption on grounds that, in practice, the applicant pool for any particular job is not the entire U.S. work force but people who have a chance to get the job. As they accurately noted, “People gravitate to jobs for which they are potentially suited” (Hartigan and Wigdor 1989, p. 166).

But embedded in the committee’s objection to Hunter’s estimates is a tacit switch in the question that the analysis is supposed to answer. The Hartigan committee sought an answer to the question, “Among those people who apply for such-and-such a position, what is the relationship between intelligence and job performance?” If one’s objective is not to discourage people who weigh only 250 pounds from applying for jobs as tackles in the NFL, to return to our analogy, then the Hartigan committee’s question is the appropriate one. Of course, by minimizing the validity of weight, a large number of 150-pound lineman may apply for the jobs. Thus our reasons for concluding that the assumption used by Hunter and Schmidt (among others), that restriction of range calculations should be based on the entire work force, is self-evidently the appropriate choice if one wants to know the overall relationship of IQ to job performance and its economic consequences.

23 The ASVAB comprises ten subtests: General Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Numerical Operations, Coding Speed, Auto/Shop Information, Mathematics Knowledge, Mechanical Comprehension, and Electronics Information. Only Numerical Operations and Coding Speed are highly speeded; the other eight are nonspeeded “power” tests. All the armed services use the four MAGE composites, for Mechanical, Administrative, General, and Electronics specialties, each of which includes three or four subtests in a particular weighting. These composites are supposed to predict a recruit’s trainability for the particular specialty. The AFQT is yet another composite from the ASVAB, selected so as to measure g efficiently. See Appendix 3.

24 About 80 percent of the sample had graduated from high school and had no further civilian schooling, fewer than 1 percent had failed to graduate from high school, and fewer than 2 percent had graduated from college; the remainder had some post-high school civilian schooling short of a college degree. The modal person in the sample was a white male between 19 and 20 years old, but the sample also included thousands of women and people from all American ethnic groups; their ages ranged from a minimum of 17 to almost 15 percent above 23 years (see Ree and Earles 199Ob). Other studies, using educationally heterogeneous samples, have in fact shown that, holding AFQT constant, high school graduates are more likely to avoid disciplinary action, to be recommended for reenlistment, and to be promoted to higher rank than nongraduates (Office of the Assistant Secretary of Defense 1980). Current enlistment policies reflect the independent predictiveness of education, in that of two applicants with equal AFQT score, the high school graduate is selected over the nongraduate if only one is to be accepted.

25 In fact, there may be some upward bias in these correlations, inasmuch as they were not cross validated to exclude capitalization on chance.

26 What does it mean to “account for the observed variation”? Think of it in this way: A group of recruits finishes its training course; their grades vary. How much less would they have varied had they entered the course with the same level of g? This may seem like a hypothetical question, but it is answered simply by squaring the correlation between the recruits’ level of g and their final grades. In general, given any two variables, the degree to which variation in either is explained (or accounted for, in statistical lingo) by the other variable is obtained by squaring the correlation between them. For example, a perfect correlation of 1 between two variables means that each of the variables fully explains the observed variations in the other. When two variables are perfectly correlated, they are also perfectly redundant since if we know the value of one of them, we also know the value of the other without having to measure it. Hence, 1 squared is 1.0 or 100 percent. A correlation of .5 means that each variable explains, or accounts for, 25 percent of the observed variation in the other; a correlation of 0 means that neither variable accounts for any of the observed variation in the other.

In the Ree and Earles study, over all eighty-nine occupational schools, the average value of this square correlation was 58 percent (which corresponds to a correlation of .76). g, in other words, accounted for almost 60 percent of the observed variation in school grades in the average military course, once the results were corrected for range restriction. Even without a correction for range restriction, g accounted for over 20 percent of the variance in school grades on the average (corresponding to a correlation of .45).

27 Welsh, Watson, and Ree 1990.

28 Jones 1988. A similar analysis was performed for job performance but, because of the expense of obtaining special performance measures,’with a much smaller sample (1,545) spread across just eight enlisted job specialties (Ree and Earles 1991). The correlations with g in this study did not reach the extraordinarily high levels of predictiveness as for school grades, and the other cognitive factors were relatively more important for job performance than for school grades—points to which we shall return. But combining the results with the previously cited job performance study of air force personnel (Office of the Assistant Secretary of Defense for Force Management and Personnel 1989), the job predictiveness of AFQT for the specialties is correlated above .9 with the job predictiveness of g. Using the highest of the various correlations between job performance measures and g, the product-moment correlation is .97 and the Spearman rank-order correlation is .93. In other words, in predicting job performance, at least for these jobs and these performance tests, the validity of an AFQT score is virtually entirely explained by how well it measures g, per se.

29 Thorndike 1986. The comparison is between the predictiveness of the first factor extracted by factor analysis of the five cognitive subtests of GATB versus the regression-weighted subtest scores themselves, for cross-validating samples of at least fifty workers in each of the twenty-eight occupations.

30 Hawk 1986; Jensen 1980, 1986; Linn 1986.

31 For the linear relationship of cognitive ability, see Schmidt, Ones, and Hunter 1992. For the nonlinear relationship of job experiences see Blankenship and Taylor 1938; Ghiselli and Brown 1947; Taylor and Smith 1956.

32 Hawk 1970; Hunter and Schmidt 1982.

33 Humphreys 1968, 1973; Wilson 1983.

34 Seep. 66.

35 Butler and McCauley 1987.

36 McDaniel, Schmidt, and Hunter 1986.

37 Schmidt et al. 1988.

38 Maier and Hiatt 1985.

39 This story echoes the mixed findings for the learning of simple tasks in the psychological laboratory. Depending on which measures are used to predict performance and which tasks are being predicted, one can expect either to see convergence of performance with practice, or no convergence, or even divergence under some circumstances. See Ackermann 1987.

40 Schmidt et al. 1988. No data have yet tested the possibility that productivity diverges (the advantage enjoyed by the smarter employee increases with experience) in very-high-complexity jobs.

41 See also Schmidt et al. 1984.

42 See the discussion in note 15.

43 Burke and Frederick 1984; Hunter and Schmidt 1982; Hunter, Schmidt, and Judiesch 1990; Schmidt and Hunter 1983; Weekley et al. 1985. In the technical literature, the standard deviation of productivity measured in dollars is represented as SDy and has generally been estimated to average, over many different occupations, .4 times the average wage for the job. The corresponding figure as a proportion of the value of the average worker output is .2. Methods for estimating these distributions are discussed in the cited references, but they include such techniques as supervisor ratings of the dollar costs of replacing workers at various points in the distribution of workers, cost accounting of worker product, and scores on proficiency tests and at work sample stations.

44 Becker and Huselid 1992.

45 The more contemporary estimate would place this value at about $16,000 rather than $8,000. All the other dollar estimates of the benefits of testing mentioned in this section could similarly be doubled.

46 Hunter, Schmidt, and Judiesch 1990.

47 We use rounds numbers to make the calculations easy to follow, but these are in fact close to the current medians.

48 Hunter, Schmidt, and Judiesch 1990.

49 25,000 × .15 = 3,750; 100,000 × .5 = 50,000; 50,000/3,750 = 13.33.

50 100,000 × .5 × .6 = 30,000; 25,000 × .15 × .2 = 750.

51 There is another point illustrated by this exercise. Recall that a validity (correlation) “explains” only the amount of variance equal to its square; hence a validity of .4 explains only 16 percent of the variance, and this offers a temptation to dismiss the importance of intelligence as being of negligible economic consequences. And yet when we calculated the gains to be realized from an ability test that is less than perfectly valid as a predictor of proficiency, we multiplied the gain from a perfect test by the validity, not by the square of the validity. When trying to estimate how much of the value of a perfect selection procedure is captured by an imperfect substitute, the validity of the imperfect test is equal to the proportion of the value that is captured by it. A test with a validity of .4 captures 40 percent of the value that would be realized from a perfect test, even though it explains only 16 percent of the variance. Readers interested in the mathematical proof, which was first derived in the 1940s, will find it in Hunter and Schmidt 1982.

52 Two of the classic discussions of the conditions under which testing pays off are Brogden 1949 and Cronbach and Gleser 1965.

53 These correlations cover the empirical range in two senses. First, they bracket the values found in the technical literature dealing with the predictiveness of intelligence. Second, they bracket the various occupations, as described by Hunter, Schmidt, and their colleagues. More complex jobs have higher correlations between intelligence and proficiency, but almost all common occupations fall in the range between .2 and .6. The graphs assume normality of the predictor and outcome variables and a linear relation between them. None of these assumptions needs to be strictly met in order for the figure to give at least an approximately correct account of the relationships, nor are there any known deviations from normality or linearity that would materially alter the account.

54 We estimate the percentile values by assuming that proficiencies are normally distributed.

55 Hunter and Hunter 1984; Schmidt, Mack, and Hunter 1984.

56 Hartigan and Wigdor 1989; Hunter and Hunter 1984.

57 The data for the following description come from Herrnstein, Belke, and Taylor 1990.

58 Hunter 1979.

59 Murphy 1986.

Chapter 4

1 Juhn, Murphy, and Pierce 1990; Katz and Murphy 1900.

2 Twenty-three percent for sixteen or more years of education versus 11 percent for twelve or fewer years, according to Katz and Murphy 1990.

3 Freeman 1976.

4 The wage decline in the 1970s for highly educated workers and in the 1980s for less educated workers could conceivably have been due to declines in the quality of college education in the earlier period and in primary and high school education in the later period or in corresponding changes in the skills of people at those levels of education, as reflected, for example, in the decline of SAT scores (Bishop 1989). Economists assessing this hypothesis have concluded that it could not have played a major role (see Blackburn, Bloom, and Freeman 1990; Juhn, Murphy, and Pierce 1990; Katz and Murphy 1990).

5 The dramatic growth of female work force participation would necessitate complex modeling to address for the labor force as a whole the question here dealt with just for men.

6 Comparing men with sixteen or more years in school to those with fewer than twelve years gives a 26.8 percent differential and to those with twelve years in school gives 29.8. Since each category is being compared to its own baseline, this calculation understates the size of the change in actual real wages.

7 In a slightly different approach to the data, Kevin Murphy and Finis Welch, restricting the analysis to white workers, also found that more education had a shrinking wage benefit from 1963 to 1979, followed by a steeply rising benefit, but only for new workers. For experienced workers, the wage benefit for education did not decline during the earlier period, then rose more modestly thereafter. Work experience, in other words, dampened the wage benefit for education from the 1970s to the 1980s (Murphy and Welch 1989. See also Murphy and Welch 1993a, 1993b).

8 That intelligence is confounded with educational attainment is hardly a new idea. See Arrow 1973; Herrnstein 1973; Jencks et al. 1972; Sewell and Hauser 1975.

9 Juhn, Murphy, and Pierce 1990; Katz and Murphy 1990.

10 Public employment shielded workers, especially female workers, from the rising wage premium for education in the 1980s and the rising premium for unmeasured individual characteristics, presumably including intelligence. In the upper half of the wage distribution for highly educated workers, the ratio of federal to private wages declined from 1979 to 1988, even after corrections for race, age, and region of the country (Cutler and Katz 1991). The decline was especially large for women, perhaps because educated women were finding relatively more lucrative alternatives outside the government. For less educated workers in the lower half of the wage distribution, the ratio of federal to private wages rose during that interval, again especially for women. For state and local (as distinguished from federal) public employees, the rise in the ratio of public to private wages for less educated workers was larger still.

11 “Residual” in the regression analysis sense. After accounting for the effects of education, experience, gender, and their various interactions, a certain amount of real wage variance remains unexplained. This is the residual that has been growing.

12 Juhn, Murphy, and Pierce, 1990; Katz and Murphy 1990; Levy and Murnane 1992.

13 Juhn, Murphy, and Pierce 1990.

14 Diligence, or conscientiousness, is one noncognitive trait that appears to earn a wage premium (Schmidt and Ones 1992). Drive, ambition, and sociability have been examined by Filer (1981). None of these has been as well established as cognitive ability, nor do they appear to be as significant in their economic effects.

15 Blackburn and Neumark 1991.

16 Blackburn and Neumark 1991. This study used the National Longitudinal Survey of Youth (NLSY), a database described in the Introduction to Part II.

17 Lest we convey the false impression that we are suggesting that education per se is immaterial, once intelligence is taken account of, we note two ingenious studies by economists Joshua Angrist and Alan Krueger (Angrist and Krueger 1991a, 1991b). They examined wages in relation to schooling for school dropouts born at different times of year and for people with varying draft lottery numbers. Dropouts in many states must remain in school until the end of the academic year in which they reach a given age. For people who want to drop out as soon as possible, those born in, say, October will spend a year in school more than those born in January. Likewise, during the Vietnam era, people whose only reason for staying in school was to avoid the draft would get more schooling if they had low lottery numbers, making them more likely to be drafted, than if they had high numbers. In both populations, the extra schooling showed a wage benefit later on. These findings show effects of education above and beyond personal traits like intelligence, if we assume that intelligence is uncorrelated with the month in which one is born or the lottery number. In fact, human births are moderately seasonal, and the seasonality differs across races, ethnic groups, and socioeconomic status, which may mean that births are seasonal with respect to average intelligence (Lam and Miron 1991). No such complication confounds the study using lottery numbers. Even so, the generality of these findings for populations other than school dropouts and for people who stayed in school only to avoid being drafted remains to be established.

18 Again from the NLSY. The sample chosen for this particular analysis was at least 30 years old, had been out of school for at least a year, and had worked fifty-two weeks in 1989 (from Top Decile Analysis). The median (as distinguished from the mean) difference in annual wages and salaries was much smaller: $3,000. A bulge of very-high-income individuals in these occupations among those with high IQs explains the gap between the mean and the median. For example, in these occupations, among those in the top decile of IQ, the 97.5th percentile of annual income was over $180,000; for those not in the top IQ decile, the corresponding income was $62,186.

19 The median wage for each occupation is the wage that has as many wages above it as below it in the distribution of wages in the occupation. A median expresses an average that is relatively insensitive to extreme values at either end.

20 A high IQ is also worth extra income outside the high-IQ occupations as we defined them. The wages and salaries of people not in the high-IQ occupations but with an IQ in the top 10 percent earned over $11,000 more in 1989 (again in 1990 dollars) than those with IQs below the top decile. The median family income of those in the top IQ decile who did not enter the high-IQ professions was $49,000, putting them at the 72d percentile of family incomes.

21 Solon 1992; Zimmerman 1992. Women are not usually included in these studies because of the analytic complications arising in the recent dramatic changes in their work force participation. The correlation is even higher if the predictor of the son’s income is the family income rather than just the father’s (Solon 1992). These estimates of the correlation between father and son income represent a new finding. Until recently, specialists mostly agreed that income was not a strong family trait, certainly not like the family chin or the baldness that passes on from generation to generation, and not even as enduring as the family nest egg. They had concluded that the correlation between fathers and sons in income was between .1 and .2—very low. Expert opinion has, however, been changing. The older estimates of the correlation between fathers’ and sons’ incomes, it turns out, were plagued by two familiar problems that artificially depress correlation coefficients. First, the populations used for gathering the estimates were unrepresentative. One large study, for example, used only high school graduates, which no doubt restricted the range of IQ scores (Sewell and Hauser 1975). Another problem has been measurement error—in the case of intergenerational comparisons of income, measurement error introduced by basing the analysis on a single year’s income. Averaging income over a few years reduces this source of error. Now, using the nationally representative, longitudinal data in the National Longitudinal Survey (NLS) and the Panel Study of Income Dynamics (PSID), economists have found the correlations of .4 to .5 reported in the text.

22 Solon 1992. For comparable estimates for Great Britain, see Atkinson, Maynard, and Trinder 1983.

23 U.S. Bureau of the Census 1991b, Table 32.

24 Herrnstein 1973, pp. 197-198.

25 For reviews of the literature as of 1980, see Bouchard 1981; Plomin and DeFries 1980. For more recent analyses, on which we base the upper bound estimate of 80 percent, see Bouchard et al. 1990; Pedersen et al. 1992.

26 Plomin and Loehlin 1989.

27 The proper statistical measure of variation is the standard deviation squared, which is called the variance.

28 Heritability is a concept in quantitative genetics; for a good textbook, see Falconer 1989.

29 Social scientists will recognize the heritability question as being akin to the general statistical model of variance analysis.

30 Plomin and Loehlin 1989.

31 Bouchard et al. 1990.

32 Estimating heritabilities from any relationship other than for identical twins is inherently more uncertain because the modeling is more complex, involving the estimation of additional sources of genetic variation, such as assortative mating (about which more below) and genetic dominance and epistasis. See Falconer 1989.

33 For a broad survey of all kinds of data published before 1981, set into several statistical models, the best fitting of which gave .51 as the estimate of IQ heritability, see Chipuer, Rovine, and Plomin 1990. Most of the data are from Western countries, but a recent analysis of Japanese data, based on a comparison of identical and fraternal twin correlations in IQ, yields a heritability estimate of .58 (Lynn and Hattori 1990).

34 The extraordinary discrepancy between what the experts say in their technical publications on this subject and what the media say the experts say is well described in Snyderman and Rothman 1988.

35 Cyphers et al. 1989; Pedersen et al. 1992.

36 Cyphers et al. 1989; Pedersen et al. 1992.

37 Based primarily on a large study of Swedish identical and fraternal twins followed into late adulthood (Pedersen et al. 1992).

38 Plomin and Bergeman 1987; Rowe and Plomin 1981.

39 IQ is not the only trait with a biological component that varies across socioeconomic strata. Height, head size, blood type, age at menarche, susceptibility to various congenital diseases, and so on are some of the other traits for which there is evidence of social class differences even in racially homogeneous societies (for review, see Mascie-Taylor 1990).

40 The standard deviation squared times the heritability gives variance due just to genes; the square root of that number is the standard deviation of IQ in a world of perfectly uniform environments: Imag = 11.6 A heritability of .4 would reduce the standard deviation from the normative value of 15 to 9.5; with a heritability of .8., it would be reduced to 13.4.

41 If we take the heritability of IQ to be .6, then the swing in IQ is 24 points for two children with identical genes, but growing up in circumstances that are at, say, the 10th and the 90th centile in their capacity to foster intelligence, a very large swing indeed. A less extreme swing from the 40th to the 60th centile in environmental conditions would move the average IQ only 4.75 points. In a normal distribution, the distance from the 10th to the 90th percentile is about 2.5 standard deviation units; from the 40th to the 60th percentile, it is about .5 standard deviation units. If the heritability is .8, instead of .6, then the swing from the 10th to the 90th percentile would be worth 17 IQ points, from the 40th to the 60th, 3.4 IQ points.

42 Burgess and Wallin 1943.

43 Spuhler 1968.

44 Jensen 1978. This estimate may be high for a variety of technical reasons that are still being explored, but apparently not a lot too high. For more, see DeFries et al. 1979; Mascie-Taylor 1989; Mascie-Taylor and Vandenberg 1988; Price and Vandenberg 1980; Watkins and Meredith 1981. In the 1980s, some researchers argued that data from Hawaii indicated a falling level of assortative mating for IQ, which they attributed to increased social mobility and greater access to higher education (Ahern, Johnson, and Cole 1983; Johnson, Ahern, and Cole 1980; Johnson, Nagoshi, and Ahern 1987). But the evidence seems to be limited to Hawaii. Other recent data from Norway and Virginia, not to mention the national census data developed by Mare and discussed in the text, fail to confirm the Hawaii data (Heath et al. 1985,1987). When intelligence and educational level are statistically pulled apart, the assortative mating for education, net of intelligence, is stronger than that for intelligence, net of educational level (Neale and McArdle 1990; Phillips et al. 1988).

45 For a discussion of regression to the mean, see Chapter 15. The calculation in the text assumes a correlation of +.8 between the average child’s IQ and the midpoint of the parental IQs, consistent with a heritability of .6 and a family environment effect of .2. The estimate of average IQs in 1930 is explained in Chapter 1. The estimate for the class of 1964 (who were freshmen in 1960) is based on Harvard SAT-Verbal scores compared to the Educational Testing Service’s national norm study conducted in 1960, which indicates that the mean verbal score for entering Harvard freshmen was 2.9 SDs above the mean of all high school seniors—and, by implication, considerably higher than that for the entire 18-year old cohort (which includes the high school dropouts; Seibel 1962, Bender 1960). If we estimate the correlation between the SAT-Verbal and IQ as +.65 (from Donlon 1984), the estimated mean IQ of Harvard freshmen as of 1960 was about 130, from which the estimate of children’s IQ has been calculated.

46 With a parent-child correlation of .8, 64 percent of the variance is accounted for, 36 percent not accounted for. The square root of .36, which is .6, times 15, is the standard deviation of the distribution of IQ scores of the children of these parents. This gives a value of 9, from which the percentages in the text are estimated.

47 Operationally, Mare compared marriage among people with sixteen or more years of schooling with those who had fewer than sixteen years of schooling (Mare 1991, p. 23). For additional evidence of increasing educational homogamy in the 1970s and 1980s, see Qian and Preston 1993.

48 Oppenheimer, 1988.

49 DES 1992, Tables 160, 168.

50 Buss 1987. For evidence that this phenomenon is well underway, see Qian and Preston 1993.

51 In the NLSY, whose members graduated from high school in the period 1976-1983, 59.3 percent had obtained a bachelor’s or higher degree by 1990. In the “High School and Beyond” study conducted by the Department of Education, only 44 percent of 1980 high school graduates who were in the top quartile of ability had obtained a B.A. or B.S. by 1986 (Eagle 1988a, Table 3).

52 See Chapter 1.

53 Authors’ analysis of the NLSY.

54 Authors’ analysis of the NLSY.

55 SAUS 1991, Table 17.

Introduction to Part II

1 Sussman and Steinmetz 1987. This is still a valuable source of information about myriad aspects of family life, mainly in America.

2 For example, in the last ten years, out of hundreds of articles and research notes, the preeminent economics journal, American Economic Review, has published just a handful of articles that call upon IQ as a way of understanding such problems. The most conspicuous exceptions are Bishop 1989; Boissiere et al. 1985; Levin 1989; Silberberg 1985; Smith 1984.

3 The criterion for eligibility was that they be ages 14 to 21 on January 1, 1979, which meant that some of them had turned 22 by the time the first interview occurred.

4 Details of the Department of Defense enlistment tests, the ASVAB, are also given in Appendix 3.

5 The test battery was administered to small groups by trained test personnel. That each NLSY subject was paid $50 to take the test helped ensure a positive attitude toward the experience.

6 See Appendix 3 for more on the test and its g loading, and the Introduction for a discussion of g itself.

7 Raw AFQT scores in the NLSY sample rose with age throughout the age cohorts who were still in their teens when they took the test. The simplest explanation is that the AFQT was designed by the military for a population of recruits who would be taking the test in their late teens, and younger youths in the NLSY sample got lower scores for the same reason that high school freshmen get lower SAT scores than high school seniors. However, a cohort effect could also be at work, whereby (because of educational or broad environmental reasons) youths born in the first half of the 1960s had lower realized cognitive ability than youths born in the last half of the 1950s. There is no empirical way of telling which reason explains the age-related differences in the AFQT or what the mix of reasons might be. This uncertainty is readily handled in the multivariate analyses by entering the subject’s birthdate as an independent variable (all the NLSY sample took the AFQT within a few months of each other in late 1980). When we present descriptive statistics, we use age-equated centiles.

8 We assigned the NLSY youths to a cognitive class on the basis of their age-equated centile scores. We use the class divisions as a way to communicate the data in an easily understood form. It should be remembered, however, that all of the statistical analyses are based on the actual test scores of each individual in the NLSY.

9 Regression analysis is only remotely related to the regression to the mean referred to earlier. See Appendix 1.

10 Age, too, is always part of the analytic package, a necessity given the nature of the NLSY sample (see note 7).

11 The white sample for the NLSY was chosen by first selecting all who were categorized by the interview screener as nonblack and non-Hispanic. From this group, we excluded all youths who identified their own ethnicity as Asian, Pacific, American Indian, African, or Hispanic.

Chapter 5

1 Ross et al. 1987. The authors used the sample tapes for the 1940 and 1950 census to calculate the figures for 1939 and 1949, antedating the beginning of the annual poverty statistics in 1959. The numbers represent total money income, including government transfers. The figure for 1939 is extrapolated, since the 1939 census did not include data on income other than earnings. It assumes that the ratio of poverty based on earnings to poverty based on total income in 1949 (.761) also applied in 1939, when 68.1 percent of the population had earnings that put them below the poverty line. Since government transfers increased somewhat in the intervening decade, the resulting figure for 1939 should be considered a lower bound.

It may be asked if the high poverty percentage in 1939 was an artifact of the Great Depression. The numbers are inexact, but the answer is no. The poverty rate prior to the Depression—defined by the contemporary poverty line—was higher yet. (See Murray 1988b, pp. 72-73).

2 See the introduction to Part II for more on the distinction between independent and dependent variables.

3 Jensen 1980, p. 281.

4 The observed stability of tests for children up to 10 years of age is reasonably well approximated by the formula, Imag where r11 and r22 are the reliabilities of the tests on occasions 1 and 2, CA1 and CA2 are the subject’s chronological age on occasions 1 and 2, and r12 is the correlation between a test taken and retaken at ages CA1 and CA2. See Bloom 1964 for a full discussion.

5 After age 10, the correlation of test scores will usually fall between the product of the reliabilities of the two tests and the square root of their product. Thus, for example, the correlation of two measures of IQ after age 10 when both tests had reliabilities of .9 may be expected to fall between .81 and .9. Since the best IQ tests have reliabilities in excess of .9, this is tantamount to saying that the stability of scores is quite high. Following are some sample reliabilities as reported in the publisher’s test manuals. WISC = .95, WAIS = .97, Wonderlic Personnel Test = .95. The reliabilities of some of the major standardized achievement tests are also extremely high. For example: ACT = .95, SAT = 90+, California Achievement Tests = .90-.95, Iowa Test of Basic Skills Composite = .98−.99. For a longer list of reliabilities and an accessible discussion of both reliability and stability, see Jensen 1980, Chap. 7.

6 Is there reason to think that, had the test been administered earlier, at age 7 or 8, the results would have turned out differently? The answer, with some reservations, is no. We would observe the normal level of fluctuation in tests administered at ages 7 and 20, with some individuals scoring higher and some lower as they grow up. The correlations between a person’s IQ obtained at age 7 and social behavior in adulthood would support the same qualitative conclusions as those based on an IQ obtained at age 20. The correlations using the younger scores would be smaller, because they measure the adult trait of intelligence less reliably than a score obtained later in life. See Appendix 3 for a discussion of changes in IQ among the members of the NLSY sample.

7 Himmelfarb 1984.

8 E.g., Ryan 1971.

9 For a few words about regression analysis, see the Introduction to Part II and Appendix 1. In fewer words still, this is a method for assessing the independent impact of each of a set of independent variables on a dependent variable. The specific form used here is called logistic regression analysis, the appropriate method for binary dependent variables, such as yes-no or female-male or married-unmarried.

10 We eliminate students to avoid misleading ourselves with, for example, third-year law students who have low incomes in 1989 but are soon to be making high incomes.

11 Note a distinction: Age has an important independent effect on income (income trajectories are highly sensitive to age), but not on the yes-no question of whether a person lives above the poverty line. It is also worth noting that age in the NLSY is restricted in range because the sample was all born within a few years of each other.

12 The imaginary person is sexless.

13 We refrain from precise numerical estimates of how much more important IQ is than socioeconomic background, for two reasons. First, they are not essential to the point of this discussion. Second, doing so would get us into problems of measurement and measurement error that would needlessly complicate the text. It seems sufficient for our purpose to note that IQ has a greater impact on the likelihood of being poor than socioeconomic background, as those variables are usually measured.

14 The 1991 poverty rate for persons 15 and over was 11.9 percent, compared to 22.4 percent for children under 15. U.S. Bureau of the Census, 1992, Table 1.

15 For an analysis of the demographic reasons and some measurement issues, see Smith 1989.

16 U.S. Bureau of the Census 1992, Table C, p. xiv.

17 U.S. Bureau of the Census 1992, Table C, p. xiv.

18 Eggebeen and Lichter 1991; Smith 1989.

19 Given childless white men and women of average age, socioeconomic background, and IQ, the expected poverty rates are only 1.6 percentage points apart and are exceedingly low in both cases: 3.1 and 4.7 percent, respectively.

20 The relationships of IQ to poverty were statistically significant beyond the .01 level for both married and unmarried women. Our policy throughout the book is not routinely to report significance statistics, but at the same time not to present any relationship as being substantively significant unless we know that it also is statistically significant.

21 An entire draft of the book was written using a different measure of IQ. As described in Appendix 3, the armed forces changed the scoring system for the AFQT in 1989. The first draft was written using the old version. After discussing the merits of the old and new measures at length, we decided to switch to the new one, because, for arcane reasons, it is psychometrically superior. The substantive effects of this change on the conclusions in the book are, as far as we can tell, effectively nil. All of the analyses have also been repeated with two versions of the SES index, and many of them with three. Again, the three versions yielded substantively indistinguishable results. But each of the successive versions of the SES index was, in our judgment, a theoretically more satisfying and statistically more robust way of capturing the construct of “socioeconomic status.”

Regarding the specific analysis of the role of gender and marital status in mediating the relationship between IQ and poverty: Originally, the analysis (and the graphic included in the text on page 138) was based on married/unmarried, men/women. Then we looked more closely at women and their various marital situations, then at those marital situations for women with children. All of the poverty analyses were conducted with two measures of poverty: the official definition (represented in this book), and a definition based on cash income obtained from sources other than government transfers. We decided to present the results using the official definition to avoid an extra layer of explanation, but we have the comfort of knowing that the interpretation fits both definitions, except for a few nuances that are not important enough to warrant a place in this concise an account. We have conducted some of these analyses for age-restricted samples, to see if things change for older cohorts in ways that are not captured by using age as an independent variable in the regression equation. Throughout all of these regression analyses, we were also looking at cross-tabulations and frequency distributions to try to see what gnomes might be lurking in the regression coefficients. Finally, we duplicated all of the analyses you see with and without sample weights, to ensure that there were no marked, mysterious differences in the two sets of results. There were undoubtedly other iterations and variations that we have forgotten over the last four years.

None of this will be surprising to our colleagues, for the process we have described is SOP for social scientists engaged in complex analyses. But for nonspecialists, the story is worth remembering. It should make you more skeptical, insofar as you understand that such enterprises are not as elegant and preordained as authors (including us) sometimes make it sound. But the story can also give you some additional confidence, insofar as, when you find yourself wondering whether we considered such-and-such an alternative way of looking at the data, the chances are fairly good that we did.

22 In passing, it just isn’t so for blacks either. The independent roles of poverty and socioeconomic status are almost exactly the same for blacks in the NLSY as for whites. See Chapter 14.

Chapter 6

1 Kronick and Hargis 1990.

2 For a discussion of definitional issues in measuring the dropout rate, see Kominski 1990.

3 Most people get their high school degrees or equivalences later than at the age of 17, so the figure on page 144 implicitly overestimates the proportion of dropouts in the population as a whole, at least for recent times. In 1985, the U.S. Government Accounting Office estimated that 13 percent of the population between the ages of 16 and 24 could be characterized as school dropouts, which amounted to 43 million people (cited by Hahn and Lefkowitz 1987; Kronick and Hargis 1990). Dropout rates in some locales may differ markedly from the national averages. In Boston, for example, dropping out of the public schools (as distinguished from losses due to transferring out of the school system) has recently risen above 45 percent (Camayd-Freixas and Horst 1987).

4 In 1990, the percentage of persons ages 25 to 29 who had completed four years of high school or more was 85.7 percent, higher than the plotted “graduation ratio,” which is based on 17-year-olds (National Center for Education 1992, Table 8).

5 Quoted in Clignet 1974, p. 38. See Chapter 22 for additional discussion.

6 Tildsley 1936, p. 89.

7 These numbers represent an unweighted mean of the six studies of ninth graders and the nine studies of students who were either seniors or graduates. When sample sizes are taken into account, the (weighted) means for the two groups are 104.2 and 105.5 (Finch 1946, Table I, pp. 28-29). This may understate the degree of difference between the dropout and the high school senior. Other studies indicate that within any given school, a statistical relationship existed between IQ and the likelihood of finishing high school. In urban areas, the size of the correlation itself could be substantial. In one of the best such studies, Lorge found for the city of New York in the 1930s that the correlation of IQ with highest completed grade was +.66 (Lorge 1942). Some of the individual studies of specific high schools conducted during that period reviewed by Finch also showed larger differences. But those studies tended to be subject to a number of technical errors. Even giving substantial weight to them, the difference between the mean IQ of the high school dropout and youths who made it to the senior year during the 1920s was considerably less than half a standard deviation (7.5 IQ points). Perhaps children who dropped out before the ninth grade had somewhat lower IQs, so that: the overall difference between diploma holders and dropouts was larger than the difference between ninth graders and twelfth graders. The data on this issue for the first half of the century are fragmentary, however.

8 If a third dropped out between ninth grade and twelfth grade, their average IQ must have been 101, compared to 107 for the seniors and graduates; if half dropped out, it must have been 103. Assuming a population average of 100, this implies that those who dropped out prior to ninth grade had still lower scores than those who dropped out afterward.

9 Iowa State Department of Public Instruction, 1965.

10 Dillon 1949, quoted in Jensen 1980, p. 334.

11 Based on a comparison of the academic aptitude scores of the ninth graders in the sample who had and had not graduated from high school five years later. The IQ equivalents are computed from a graduate-dropout gap of 1.14 standard deviations (SDs) for boys and 1.00 SDs for girls, or approximately 1.05 SDs overall (Wise et al. 1977, Table A-3). In the late 1960s, the Youth in Transition study found a difference of about .8 SDs on the vocabulary subtest of the GATB and the Gates Reading Tests between dropouts and nondropouts, consistent with a 1 SD difference on a full-scale battery of tests (reconstructed from Table 6-1, p. 100, and Tables C-3-7 and C-3-8 in Bachman et al. 1971).

12 Looking at these numbers, some readers will be wondering how much these dropout figures represent cause and how much effect. After all, wouldn’t a person who stayed through high school and then took the IQ test have got ten a higher score by virtue of staying in high school? This question of cause and effect may be raised with all of the topics using the NLSY, but it is most obvious for school dropout. But while age has an effect on AFQT scores and is always taken into account (either through age-equated scores in the descriptive statistics or by entering age as an independent variable in the regression analyses), there is no reason to think that presence in school is decisive. The simplest way to document this is by replicating the analyses for a restricted sample of youths who were age 16 and under when they took the test, thereby excluding almost all of the members of the sample who might create these artifacts. Having done so for all of the results reported in this chapter, we may report that it makes no difference in terms of interpretations. We will not present all of these duplicate results, but an example will illustrate.

Using the full sample of whites, the mean IQs, expressed in standard scores, of those who completed high school via the normal route, those who got a high school equivalency, and those who dropped out permanently were +.37, -.14, and -.94 respectively. For whites who took the AFQT before they were age 17, the comparable means were +.34, -.04, and -.95. The main effect of using the age-restricted sample is drastically to reduce sample sizes, which we judged to be an unnecessary sacrifice. The NLSY data are consistent with other investigations of this issue (e.g., Husén and Tuijnman 1991). Continued schooling makes a modest contribution to intellectual capital but not enough to make much difference in the basic relationships linking IQ to other outcomes. Chapter 17 specifically discusses the impact of schooling on IQ, and Appendix 3 elaborates on the relationship of schooling to IQ in the NLSY.

13 Other data confirm this general picture. In the High School and Beyond national sample conducted by the Department of Education in 1980, it was found that those in the lowest quartile on the cognitive ability test dropped out at a rate of 26.5 percent, compared to 14.7 percent, 7.8 percent, and 3.2 percent in the next three quartiles, respectively (Barro and Kolstad 1987, Table 6.1, p. 46). Similar results have been found in other recent studies of dropouts and cognitive ability (e.g., Alexander et al. 1985; Hill 1979). Comparable rates of dropping out across the IQ categories and across categories defined by vocabulary test scores were also found in the earlier Youth in Transition study, based on approximately 2,000 men selected to be representative of the national population in the tenth grade in 1967 (Bachman et al. 1971). For an estimate of the loss in cognitive ability that may be attributed to dropout itself, see Alexander et al. 1985.

14 The General Educational Development exam is administered by the American Council on Education.

15 Cameron and Heckman 1992.

16 DES 1991, Tables 95, 97. In the NLSY, 9.5 percent of those classified as having a high school education got their certification through the GED.

17 As depicted in, for example, Coles 1967, in his work on certain impoverished populations. The relative roles of socioeconomic background and IQ found in the NLSY are roughly comparable to those found for the Youth in Transition study based on students in the late 1960s, though the method of presentation in that study does not lend itself to a precise comparison (Bachman et al. 1971, Chap. 4-6).

18 In passing, it may be noted that these results hold true for blacks as well. Of the blacks in the NLSY who permanently dropped out of school, none was in the top quartile of IQ. Only nine-tenths of 1 percent of black permanent dropouts were in the top half of IQ and the bottom half of SES. See Chapter 14 as well.

19 In a logistic regression, with all independent variables expressed as standard scores, the coefficients for IQ, SES, Age, and the SES x IQ interaction term were 1.91, .98, -.06, and .32, respectively. The intercept was 2.81. The interaction term was significant at the .005 level, and r2 =.38. The equation is predicting “true” for a binary variable denoting high school graduation (with permanent dropout as the “false” state).

20 Press accounts of the GED population suggest that the typical youngster in it had trouble with the routine of ordinary school and comes from un-commonly deprived family circumstances (e.g., Marriot 1993).

21 Matarazzo 1972, pp. 178-180.

22 The percentages were 68 and 23, respectively.

Chapter 7

1 The figure on page 156 also echoes some of the large macroeconomic forces that we did discuss in preceding chapters. To some extent, the pool of “16—19-year-olds not in school” has changed as high schools have retained more students longer and colleges have recruited larger numbers of the brightest into college. As the pool has changed, so perhaps has the em-ployability of its members. The greater employment problems shown by the figure also fit in with the discussion about earnings in Chapter 4 and the way in which income has stagnated or fallen for those without college educations. For concise reviews of the empirical literature on labor supply and unemployment, see Heckman 1993; Topel 1993. Studies focused on young disadvantaged men include Wolpin 1992; Cogan 1982; Bluestone and Harrison 1988; Cohen 1973; Holzer 1986. There is, of course, a large literature devoted explicitly to blacks. See Chapters 14 and 20.

2 We conducted parallel analyses with a sample based on the most recent year of observation (back to 1984), which enabled us to include data on some men who were being followed earlier but subsequently disappeared from the NLSY sample. The purpose was to compensate for a potential source of attrition bias, on the assumption that men who disappeared from the NLSY sample might be weighted to some degree toward those with the fewest connections to a fixed address and (by the same token) to the labor market. The results obtained by this method were substantively indistinguishable from the ones reported.

3 We replicated all of the analyses using the actual number of weeks out of the labor force as the dependent variable instead of a binary yes-no measure of whether any time was spent out of the labor force. The relative roles of the independent variables were the same as in the reported analyses, with similar comparative magnitudes as well as the same signs and levels of statistical significance. The relationship, such as it is, does not seem to be concentrated among the children of the very wealthy.

4 A more fine-grained examination of the data reveals that absence from the labor force and job disabilities is extraordinarily concentrated within a limited set of the lowest-status jobs. Using a well-known index of job prestige, the Duncan index, 46 percent of the reports of job limitations and 63 percent of those who reported being prevented from working (but who were still listing an occupation) came from jobs scored 1 to 19 on the Duncan scale, which ranges from 1 to 100. A total of 975 white men in the NLSY listed such a job as their occupation in 1990. The five most common jobs in this range, accounting for 35 percent of the total, were truck driver, automobile mechanic, construction laborer, carpenter, and janitor. Another 299 white males working in blue-collar jobs scored 20 to 29 on the Duncan scale. The five most common jobs in this range, accounting for 37 percent of the total, were welder, heavy equipment mechanic, other mechanic and repairman, brick mason, and farmer. Another 158 white males were working in blue-collar jobs scored 30 to 39 on the scale. The five most common jobs in this range, accounting for 47 percent of the total, were delivery man, plumber and pipefitter, machinist, sheet metal worker, and fireman.

Looking over these jobs, it is not readily apparent that the lowest-rated jobs in terms of prestige are also the physically most dangerous or demanding. Construction work fits that description in the lowest category, but so does fireman, sheet metal worker, and others in the higher categories. Meanwhile, some of jobs in the lowest category (e.g., truck driver, janitor) are not self-evidently more dangerous or physically demanding than some jobs in the higher categories. Or to put it another way: If a third party were given these fifteen job titles and told to rank them in terms of potential accidents and the importance of physical fitness, it is unlikely that the list would also be rank-ordered according to the job prestige index or even that the rank ordering would have much of a positive correlation with the job prestige index.

Instead, the index was created based on the pay and training that the jobs entail—both of which would tend to give higher ratings to cognitively more demanding jobs. And so indeed it works out. Here are the mean IQ scores of white males in blue-collar jobs, subdivided by groups on the Duncan scale, alongside the number per 1,000 who reported some form of job-related health limitation in 1989:

Duncan Scale Score (Limited to Blue Collar Occupations)

Mean IQ Percentile

No. per 1,000 with Job-Related Health Disability
















In short, the results of the regression analysis indicating that IQ has an important relationship to job disability even among blue-collar jobs, and even after taking age and years of education into account, are not explained away by the differences in the physical risks of these occupations. The same conclusion holds true when the analysis is conducted only for blue-collar workers and the variable “years of education” is added to the equation. The coefficient relating IQ to likelihood of disability is about four times the coefficient for years of education (with age as the other independent variable constant). Intriguingly, the opposite is true when the analysis is conducted just for white-collar workers: Years of education is important, wiping out any independent role for IQ. Interpreting this is difficult, both because health disability is such a rare phenomenon among white-collar workers and because IQ becomes so tightly linked to advanced education, which in turn is associated with jobs in which physical disability is virtually irrelevant (short of a stroke or other accident causing a mental impairment).

5 Terman and Oden 1947.

6 Hill 1980; Mayer and Treat 1977; O’Toole 1990; Smith and Kirkham 1982.

7 Grossman 1976; Kitagawa and Hauser 1960.

8 Restriction of range (see Chapter 3) might also reduce the independent role of IQ among college graduates.

Chapter 8

1 For a review of the literature about family decline, see Popenoe 1993.

2 U.S. Bureau of the Census 1992, Table 51.

3 Retherford 1986.

4 Garrison 1968; James 1989.

5 The cognitive elite did get married at somewhat older ages than others, and this difference will grow as the NLSY cohort gets older. Judging from other data, almost all of those in the bottom half of the IQ distribution who will ever marry have already married by 30, whereas many of that 29 percent unmarried in Class I will eventually marry, raising their mean age of marriage by some unknown amount. If all of them married at, say, age 40, the average age at marriage would approach 30, which may be taken as the highest mean that the NLSY could plausibly produce as it follows its sample into middle age.

6 In his famous lifetime study of intellectually gifted children born around 1910, Lewis Terman found that, as of the 1930s and 1940, highly gifted men eventually got married at higher rates than the national norms—about 84 percent, compared to a national rate of 67 percent for men of similar age. Gifted women married later than the average woman, but by their mid-30s they too had higher marriage rates than the general population, though the difference was not as great as for men: 84 percent compared to 78 percent (Terman and Oden 1947, p. 227).

7 Cherlin 1981, Figure 1-5. His estimation procedure suggests that the odds of eventual divorce in 1980 were 54 percent. Also see Raschke 1987.

8 We are here calculating odds ratios—the likelihood of marital survival divided by the likelihood of divorce within the first five years—from the table on page 174. The ratio of odds ratios for marital survival versus divorce during the first five years of marriage was 2.7, comparing Class I to Class V.

9 In addition to the standard variables (age, parental socioeconomic status, and IQ), we added “date of first marriage.” We wished to add age at first marriage as well, but it was so highly correlated with the date of first marriage in the entire white sample (r = +.81 ) that the two variables could not be used together. It was possible to use them together in some of the subsamples we analyzed. The pattern of results was unchanged.

10 Different subsets of white youths, both the entire sample of those who had married and the subset of those who had reached the age of 30, and the subset below the age of 30 all yielded similar results.

11 E.g. Raschke 1987; Sweet and Bumpass 1987.

12 Higher socioeconomic status is also associated with a lower probability of divorce in the college sample, though the independent effect of parental SES is much smaller than the independent effect of IQ. Socioeconomic status had an insignificantly direct relationship with divorce for the high school sample. Thinking back to the analysis of marriage, note a curious contrast: IQ makes a lot of difference in whether high school graduates get married but not in whether they get divorced. IQ makes little difference in whether college graduates get married by the age of 30 but a lot of difference in whether they get divorced. Why? We have no idea. In any case, embedded in this complicated set of findings are intriguing possibilities, which warrant a full-scale analysis.

13 Raschke 1987; Sweet and Bumpass 1987; Teachman et al. 1987.

14 Even a genetic component has been invoked to explain the fact that divorce runs in families. Not only do children tend to follow their parents’ path toward divorce, but identical twins are more correlated in their likelihood of divorce than fraternal twins, a difference that often betrays some genetic influence. McGue and Lykken 1992.

15 Those living with only the father did as well as those living with both biological parents.

16 See references in Raschke 1987; South 1985.

17 Bronislaw Malinowski, Sex, Culture, and Myth (1930), quoted in Moynihan 1986,p. 170.

18 The production of illegitimate babies per unit population has also increased during this period, with the fastest growth occurring during the 1970s. In the jargon, the rate of illegitimate births has increased as well as the ratio.The distinction between rate and ratio raises a technical issue that has plagued the discussion of illegitimacy in recent years. Traditionally, illegitimacy rates have been computed by dividing the number of illegitimate births by the number of unmarried women. In a period when marital patterns are also shifting, this has the effect of confounding two different phenomena: the number of illegitimate births in the numerator of the ratio and the number of unmarried women in the denominator. To estimate the rate of change in the production of illegitimate children per unit population, it is essential to divide the number of illegitimate births by the entire population (or, if one prefers, by the number of women of childbearing age). This is almost never done, however, in nontechnical discussions (or in many of the technical ones, for that matter). For a discussion of the difference this makes in interpreting trends in illegitimacy, see Murray 1993.

19 Sweet and Bumpass 1987, p. 95. In 1960, there were 73,000 never-married mothers between the ages of 18 and 34; in 1980, there were 1,022,000.

20 Bachu 1991, Table 1. The figures for ages 18 to 34 are interpolated from the published figures for ages 15 to 34.

21 Not to mention that IQ has changed in the wrong direction to explain increasing illegitimacy (see the Flynn Effect, discussed in Chapters 13 and 15).

22 As in the case of school dropout, one may ask whether having a baby out of wedlock as a teenager caused school dropout, therefore resulting in an artificially low IQ score. As before, the cleanest way to test the hypothesis is to select all the women who had their first baby after they took the test in 1980 and repeat the analyses reported here, introducing a control for age at first birth. When this is done, the relationships reported continue to apply as strongly as, and in some cases more strongly than, they do for the entire sample.

A similar causal tangle is associated with the age at first birth. Age at first birth is a powerful explanatory variable in a statistical sense. It can drastically change the parameters, especially the importance of socioeconomic status and IQ, in a regression equation. But, in the 1990s, what causes a girl in her teens to have a baby? Probably the same things that might cause her to have an illegitimate baby: She grew up in a low-status household where having a baby young was an accepted thing to do; she is not very bright and gets pregnant inadvertently or because she has not thought through the consequences; or she is poor and has a baby because it offers better rewards than not having a baby, whether those rewards are tangible in the form of an income and apartment of her own through welfare, or in the form of having someone to love. And in fact all three variables—parental SES, IQ, and whether she was living in poverty prior to the birth—are powerful predictors of age at first birth, explaining 36 percent of the variance. Furthermore, age at first birth cannot be a cause of parental SES and poverty in the year prior to birth. Empirically, it can be demonstrated not to be a “cause” of the AFQT score, using the same logic applied to the case of illegitimacy.

23 Rindfuss et al. 1980.

24 Abrahamse et al. 1988. The analysis is based on a sample of 13,061 girls who were sophomores in 1980 at the time of the High School and Beyond (HS&B) baseline survey and also responded to the first follow-up questionnaire in 1982.

25 The exact figures, going from the bottom to the top quartile in socioeconomic status, are 38.7 percent, 29.7 percent, 19.9 percent, and 11.7 percent, based on weighted data, computed by the authors from the HS&B database. Figures reported here and on other occasions when we refer to the RAND study will sometimes show minor discrepancies with the published account, because Abrahamse et al. used imputed figures for certain variables, based on schoolwide measures, when individual data were missing. Our calculations do not use any imputed figures. As in the RAND study, all results are based on weighted analyses using the HS&B population weights.

26 For mothers of an illegitimate baby, the mean on the test of cognitive ability was .73 SD below the mean for all girls who had babies, and .67 SD below the mean for all white girls (mothers and nonmothers).

27 Limiting the analysis to first births avoids a number of technical problems associated with differential number of children per woman by cognitive and socioeconomic class. Analyses based on all children born by the 1990 interview show essentially the same results, however. We also conducted a parallel set of analyses using as the dependent variable whether the woman had ever given birth to a child out of wedlock (thereby adding women without any children at all to the analysis). The interpretations of the results were not markedly different for any of the analyses presented in the text.

28 We are, as usual, comparing the effects of a shift equal to ±2 SDs around the mean for both independent variables, cognitive ability and socioeconomic status.

29 Bachu 1993, Table J.

30 Bachu 1993, Table J.

31 The comparable probabilities given parental SES standard scores of-2 and +2 were 31 percent and 19 percent.

32 The literature is extensive. Two recent reviews of the literature are Moffitt 1992 and Murray 1993. See also Murray 1994.

33 The writing on this topic is much more extensive for the black community than the white. See, for example, Anderson 1989; Duncan and Hoffman 1990; Furstenberg et al. 1987; Hogan and Kitagawa 1985; Lundberg and Plotnick 1990; Rowe and Rodgers 1992; Teachman 1985; Moffitt 1983.

34 For a detailed presentation of this argument, see Murray 1986b.

35 An analysis based not on the dichotomous variable, poverty, but on income had essentially the same outcome.

36 When we repeat the analysis yet again, adding in the presence of the biological father, these results are sustained. Poverty and cognitive ability remain as important as before; the parents’ poor socioeconomic status does not increase the chances of illegitimate babies.

Chapter 9

1 Louchheim 1983, p. 175. See also Liebmann 1993.

2 Bane and Ellwood 1983; Ellwood 1986b; Hoffman 1987.

3 The studies are reviewed in Bendick and Cantu 1978.

4 Hopkins et al. 1987.

5 This figure includes women not reflected in the table who did not go on AFDC within the first year after birth, received welfare at some later date, but did not become chronic recipients.

6 In all cases, we limit the analysis to women for whom we have complete data and whose child was born prior to January 1,1989. We also conducted this analysis with another definition of short-term recipiency, limiting the sample to women whose children had been born prior to 1986, divided into women who had never received welfare subsequently and women who had received welfare up to half of the years that they were observed but did not qualify as chronic welfare recipients. The results were similar to the ones reported in the text, with a large negative effect of IQ and an insignificant role for SES.

7 Bane and Ellwood 1983; Ellwood 1986a; Murray 1986a.

8 Ellwood 1986a; Murray 1986a.

9 We conducted a parallel analysis comparing chronic welfare recipients with all other mothers, including those who had been on welfare but did not qualify as chronic. There are no important differences in interpretation for the results of the two sets of analyses.

10 Among all white women, only 16 percent had not gotten a high school diploma, and 27 percent had achieved at least a bachelor’s degree.

11 Once again, this analysis has to be based on women with a high school diploma because there was no way to analyze welfare recipiency among white women with B.A.s. Only two white women with B.A.s in the NLSY had become chronic recipients. But for the high school graduates, the effect of parental SES is modest—slightly smaller than the independent effect of cognitive ability. This pattern was generally shared among women who had gone on to get their GED (recall that people with a GED are not included in the high school sample).

12 Some of the obvious explanations are not as important as one might expect. For example, most of the high school dropouts who became chronic welfare recipients were not poor; only 36 percent of them had been below the poverty line in the year before birth. Nor is it correct to assume that all of them had babies out of wedlock; nearly half (46 percent) of their first babies had been born within marriage. But 70 percent of the chronic welfare recipients among the high school dropouts had had their first child before they turned 19, which means that some very large proportion of them had the baby before they would normally have graduated. Among high school dropouts who had not had a child before their nineteenth birthday, the independent relationships of IQ and socioeconomic ststus shift back toward the familiar pattern, with the effects of IQ being much larger than those of socioeconomic status.

13 Indeed, the teenage mothers who did not become chronic welfare recipients had a slightly lower mean IQ than those who did (23d centile versus 26th centile). Meanwhile, the ones who did not become welfare recipients at all had a fractionally higher mean socioeconomic status than the ones who did (27th centile versus 26th).

14 Having a high school diploma was an important variable in all of the analyses of welfare, over and above the effects of either cognitive ability or socioeconomic background, and regarding either short-term or chronic welfare recipiency. The question is whether the high school diploma—and we are referring specifically to the high school diploma, not an equivalency degree—reflects a cause or a symptom. Does a high school education prepare the young woman for adulthood and the world of work, thereby tending to keep her off welfare? Or does the act of getting a high school diploma reflect the young woman’s persistence and ability to cope that tend to keep her off welfare? It is an important question; unfortunately, we were unable to think of a way to answer it with the data we have.

15 All are mutually exclusive groups. Criteria follow those for temporary and chronic welfare recipiency defined earlier.

Chapter 10

1 Anderson 1936.

2 See Bronfenbrenner 1958, p. 424, for a review of the literature through the mid-1950s. For a recent empirical test, see Luster et al. 1989.

3 Kohn 1959.

4 Kohn 1959.

5 Kohn 1959, p. 366.

6 Heath 1983.

7 The study also includes “Trackton,” a black lower-class community.

8 Heath 1982, p. 54.

9 Heath 1981, p. 61.

10 Heath 1982, p. 62.

11 Heath 1982, p. 63.

12 Gottfried 1984, p. 330.

13 Kadushin 1988, p. 150.

14 Drawn from Kadushin, 1988, pp. 150-151. Formally, neglect is defined by one of the leading authorities, Norman Polansky, as a situation in which the caretaker “permits the child to experience avoidable present suffering and/or fails to provide one or more ingredients generally deemed essential for developing a person’s physical, intellectual or emotional capacities.” Quoted in Kadushin, p. 150.

15 Kaplun, 1976; Smith and Adler, 1991; Steele 1987; Trickett et al. 1991.

16 E.g., Azar et al. 1984. For a discussion of weaknesses in the state of knowledge about causes and an argument for continuing to treat abuse and neglect separately, see Cicchetti and Rizley 1981. See also Bousha and Twentyman 1984; Herrenkohl et al. 1983.

17 Some recent reviews of the evidence on causation are Hegar and Yung-man 1989; Polansky 1981; Zuravin 1989. The intergenerational explanation is one of the most widely known. For a review of the literature and some important qualifications to assumptions about intergenerational transmission, see Kaufman and Zigler 1987.

18 Besharov 1991.

19 D. Besharov and S. Besharov, quoted in Pelton 1978, p. 608.

20 Parke and Collmer 1975.

21 Coser 1965; Horowitz and Liebowitz 1969.

22 Jensen and Nicholas 1984; Osborne et al. 1988.

23 Leroy H. Pelton’s literature review is still excellent on the studies through the mid-1970s, as is Garbarino’s. See Garbarino and Crouter 1978; Pelton 1978. Also see Straus and Gelles 1986; Straus et al. 1980; Trickett et al. 1991. Unless otherwise noted, the literature review in this section is not restricted to whites.

24 U.S. Department of Health and Human Services 1988; Wolfe 1985.

25 Gil 1970.

26 Reported in Pelton 1978.

27 Young and Gately 1988, pp. 247, 248.

28 Reported in Pelton 1990-1991.

29 Klein and Stern 1971; Smith 1975.

30 Baldwin and Oliver 1975.

31 Cohen et al. 1966; Johnson and Morse 1968.

32 Smith et al. 1974.

33 Pelton 1978, pp. 612-613.

34 Gil 1970. Recall that Chapter 6 demonstrated that cognitive ability was a stronger predictor of school dropout than socioeconomic status.

35 Brayden et al. 1992.

36 Crittenden 1988, p. 179.

37 Drotar and Sturm 1989.

38 Azar et al. 1984. See Steele 1987 for supporting evidence and Kravitz and Driscoll 1983 for a contrary view.

39 Bennie 1969.

40 Dekovic and Gerris 1992. For findings in a similar vein, see Goodnow et al. 1984; Keller et al. 1984; and Knight and Goodnow 1988. For studies concluding that parental reasoning is not related to social class, see Newberger and Cook 1983.

41 Polansky 1981, p. 43.

42 Most tantalizing of all was a prospective study in Minnesota that gave an extensive battery of tests to young, socioeconomically disadvantaged women before they gave birth. In following up these mothers, two groups were identified: one consisting of thirty-eight young women with highstress life events and adequate care of their children (HS-AC), and the other of twelve young women with high-stress life events and inadequate care (HS-NC). In the article, data on all the tests are presented in commendable detail, except for IQ. In the “method” section that lists all the tests, an IQ test is not mentioned. Subsequently, there is this passage, which contains everything we are told about the mentioned test: “The only prenatal measure that was not given at 3 months [after birth] was the Shipley-Hartford IQ measure. The mean scores on this measure were 26.9 for the HS-AC group and 23.5 for the HS-NC group (p = .064).” Egeland 1980, p. 201. A marginally statistically significant difference with samples of 12 and 38 suggests a sizable IQ difference.

43 Friedman and Morse 1974; Reid and Tablin 1976; Smith and Hanson 1975.

44 Wolfe 1985.

45 Berger 1980.

46 Young 1964, cited by Berger 1980.

47 Wolfe 1985, pp. 473-474.

48 It is understandable that many survey studies cannot obtain a measure of IQ. But virtually all of the studies discussed called for extensive cooperation by the abusive parents. The addition of a short intelligence test would seem to have been readily feasible.

49 The actual quotation is dense but intriguing: “Moreover, they [the British researchers] have shown that parental competence (defined as sensitivity and responsiveness to infant cues, quality of verbalization, and physical contact, and related skills) and adjustment (e.g., low anxiety and adequate flexibility) were distinguishing abilities that moderated the impact of aversive life events” (Wolfe 1985, p. 478).

50 Honesty of the respondents apart, the NLSY data do not address this issue. The question about drinking asked how often a woman drank but not how much at any one time. Since a single glass of wine or beer a few times a week is not known to be harmful, the drinking data are not interpretable.

51 Roughly equal proportions of smokers in the low and high cognitive classes told the interviewers that they had cut down during pregnancy—about 60 percent of smokers in each case.

52 Leonard et al. 1990; Hack and others 1991.

53 “Low birth weight” is operationally defined as infants weighing less than 5.5 pounds at birth. This definition, however, mixes children who are carried to term and are nonetheless underweight with children who are born prematurely (which usually occurs for reasons over which the woman has no control) but who are otherwise of normal weight and development. In the jargon, these babies have a weight “appropriate for gestational age” (AGA). Babies who weighed less than 5.5 pounds but whose weight was equal to or higher than the medical definition of AGA (using the Colorado Intrauterine Growth Charts) were excluded from the analysis.

54 The dip in the proportion for Class V could also be an artifact of small sample sizes. The proportion (computed using sample weights) is produced by 9 out of 116 babies. Sample sizes for the other cognitive classes—II, III, and IV—were much larger: 573, 2,059, and 737, respectively.

55 Hardy and Mellis 1977.

56 Cramer 1987. In a revealing sign of the unpopularity of intelligence as an explanatory variable, Cramer treats years of education as a proxy measure of socioeconomic status. For other studies showing the relationship of education to infant mortality, see Bross and Shapiro 1982; Keller and Fetterly 1978.

57 This is a persistent issue in infant mortality research. There are varying opinions about how important the distinction between neonatal and infant deaths may be. See Eberstein and Parker 1984.

58 Duncan 1993.

59 The calculation assumes that the mother has average socioeconomic back-ground.

60 It measures, among other things, the emotional and verbal responsiveness and involvement of the mother, provision of appropriate play materials, variety in the daily routine, use of punishment, and organization of the child’s environment. The HOME index was created and tested by Bettye Caldwell and Robert Bradley (Caldwell and Bradley 1984).

61 From Class IV to Class II, they were the 48th, 60th, and 68th percentile, respectively. For most of the assessments, including the HOME index, the NLSY database contains raw scores, standardized scores, and centile scores. For technical reasons, it is more accurate to work with standardized scores than percentiles when computing group means, conducting regression analyses, and so forth. On the other hand, centiles are much more readily understood by the ordinary reader. We have conducted all analyses using standardized scores, then converted the final results as reported in the tables back into centiles. Thus, the centiles in the table are not those that will be produced by simply averaging the HOME centile scores in the NLSY.

62 We replicated all of these analyses using the HOME index as a continuous variable, and the substantive conclusions from those replications are consistent with the ones reported here.

63 The HOME index has different scoring for children younger than 3 years old, children ages 3 through 5, and children ages 6 and older. We examined the HOME results for the different age groups and found that they could be combined without significant loss of precision for the interpretations we describe in the text. There is some evidence that the mother’s IQ was most important for the home environment of children ages 3 through 5 and least important for children ages 6 and older, but the differences are not dramatic.

64 E.g., Duncan 1993 and almost anything published by the Children’s Defense Fund.

65 We also conducted analyses treating family income as a continuous variable, which showed consistent results.

66 The poverty measure is based on whether the mother was below the poverty line in the year prior to the HOME assessment. Independent variables were IQ, mother’s socioeconomic background, mother’s age, the test year, and the child’s age group (for scoring the HOME index).

67 The table on page 222 shows the predicted odds of being in the bottom decile on the HOME index from a regression equation, using the child’s sample weights, in which the dependent variable is a binary representation of whether an NLSY child had a HOME score in the bottom decile, and the independent variables were mother’s IQ, mother’s socioeconomic background, mother’s age, and nominal variables representing the test year, the age category for scoring the HOME index, poverty in the calendar year prior to the administration of the HOME index, and receipt of AFDC in the calendar year prior to the administration of the HOME index.

Mother’s IQ

Mother’s Socioeconomic Background

In Poverty?

On Welfare?

Odds of Being in the Bottom Decile on the HOME Index






















Very low





Very low





Very low





Very low





low Average





low Average





low Average





low Average




“Very low” is defined as two SDs below the mean. Poverty and welfare refer to the calendar year prior to the scoring of the HOME index.

68 The NLSY reported scores on these indexes for infants under 1 year of age, not analyzed here.

69 This statement applies to the full white sample. In the cross-sectional sample, used for the regression results in Appendix 4, the role of birth status (legitimate or illegitimate) was not significant when entered along with poverty and welfare receipt.

70 A technical note that applies to the means reported in the table on page 230 and in Chapter 15. In applying the national norms, the NLSY declined to estimate scores for very low-scoring children not covered in the PPVT’s scoring tables, instead assigning them a score of zero. For purposes of computing the means above and in Chapter 15, we assigned a score of 40 (four SDs below the mean, and the lowest score assigned in the standard tables for scoring the PPVT) to all children with scores under 40.

71 Careful readers may be wondering why white children, who have had less than their fair share of the bottom decile for most of the other indicators, account for fully 10 percent of all NLSY children in the bottom decile. The reason is that the women of the NLSY sample (all races) have had a high proportion of low-IQ children, based on the national norms for the PPVT—fully 23 percent of all NLSY children ages 6 and older when they took the test had IQs of 80 or lower. For whites, 10 percent of the children who have been tested fall into the bottom decile. This news is not quite as bad as it looks. Just because the NLSY mothers were a nationally representative sample of women in a certain age group does not mean that their children are a nationally representative sample of children. But the news is nonetheless worrisome, with implications that are discussed in Chapter 15.

72 See Chapter 4 for the discussion of heritability of IQ.

Chapter 11

1 The proportional increases in property crime tracked more or less with the increases in violent crime until the late 1970s. Since then, property crime has moved within a narrow range and in 1992 was actually lower than it had been ten years earlier. This divergence between violent and property crimes is in itself a potentially significant phenomenon that has yet to be adequately explored.

2 For citations of the extensive literature on this subject, see Chaiken and Chaiken 1983; Wilson and Herrnstein 1985. The official statistics may have understated the increase in these “crimes that people consider serious enough to warrant reporting to the police,” insofar as many burglaries, assaults, and street robberies that would have been reported in the 1950s (when there was a reasonable chance that the police would conduct a genuine investigation) are no longer reported in urban areas, where it is taken for granted that they are too minor to compete for limited police resources.

3 A more traditional way to sort the theories is to contrast classical theories, which depict crime as the rational behavior of free agents, based on costs and benefits, with positive theories, which look for the causes of crime in society or in psychological makeup (for discussion of criminological theory, see, for example, Gottfredson and Hirschi 1990; Wilson and Herrnstein 1985). We are distinguishing only among positive theories, because the notion of criminals as rational agents seems to fit few actual criminals and the role of costs and benefits can readily be absorbed by a positive theory of criminal behavior (see Wilson and Herrnstein 1985, Chap. 2). A distinction similar to ours between psychological and sociological theories is one between “psychiatric” and “criminological” theories in Wessely and Taylor 1991.

4 Freeman 1983; Mayer and Jencks 1989; Wilson and Herrnstein 1985, Chaps. 11, 12.

5 Cleckley 1964; Colaizzi 1989.

6 Wilson and Herrnstein 1985.

7 Wilson and Herrnstein 1985.

8 In fact, within criminological theory, the distinction between being disposed to break the law and being disposed to obey it has some resonance, as illustrated in, for example, Gottfredson and Hirschi 1990. This is a fine point of theory, which we cannot elaborate on here.

9 For more extended discussion of the logic of the link between IQ and committing crime, see Gottfredson and Hirschi 1990; Hirschi 1969; Wilson and Herrnstein 1985.

10 Goring 1913.

11 Goddard l914.

12 Murchison 1926. We know now that this was a peculiarity of a federal prison like Leavenworth, which had relatively few of the run-of-the-mill offenders typical in state prisons.

13 Sutherland 1931.

14 Haskell and Yablonsky 1978, p. 268.

15 Reid 1979, p. 156.

16 Hirschi and Hindelang 1977.

17 Reid 1982.

18 A balanced, recent summary says, “At this juncture it seems reasonable to conclude that the difference [between offenders and nonoffenders in intelligence] is real and not due to any of the possible methodological or confounding factors that have been noted in the literature” (Quay 1987 p. 107ff.)·

19 The gap between offenders and nonoffenders is typically larger on verbal than on performance (i.e., nonverbal) intelligence tests (Wilson and Herrnstein 1985). It has been suggested that this is because the essential difference between offenders and nonoffenders is the difference in g; it is well known that verbal scores are more dependent on g than performance scores (Gordon 1987; Jensen and Faulstich 1988). Another, not necessarily inconsistent, interpretation is that verbal intelligence scores do better at measuring the capacity for internalizing the prohibitions that help deter crime in nonoffenders (Wilson and Herrnstein 1985). Multiple offenders, as distinguished from offenders in general, also have significant deficits in logical reasoning ability per se (Reichel and Magnusson 1988). Whatever the reason for these patterns of differences, the methodological implications are clear: The rare study that fails to find much of an association between IQ and offending may have used nonverbal scores or scores that, for one reason or another, minimize individual differences in g.

20 E.g., Blumstein et al. 1985; Denno 1990. National studies of convicts who get rearrested after release also show that those with low levels of education (which are presumably correlated with low test scores) are at higher risk for recidivism (Beck and Shipley 1989).

21 Lipsitt et al. 1990.

22 Reichel and Magnusson 1988.

23 Hirschi 1969; Wilson and Herrnstein 1985.

24 Nicholson and Kugler 1991.

25 The evidence in fact suggests that smart offenders pick crimes with lesser likelihood of arrest and larger payoffs (Wilson and Herrnstein 1985).

26 Moffitt and Silva 1988; Hindelang et al. 1981; Hirschi and Hindelang 1977; Wilson and Herrnstein 1985.

27 Reichel and Magnusson 1988.

28 Kandel et al. 1988.

29 In this sample, there was no significant correlation between IQ and socioeconomic status, and IQ remained a significant predictor of offending even after the effects of parental SES and the sons’ own level of education were entered as covariates in an analysis of covariance.

30 White et al. 1989.

31 Werner and Smith 1982.

32 Werner 1989; Werner and Smith 1982.

33 For an entry into this literature, see Farrington and West 1990; Gottfredson and Hirschi 1990; Mednick and others 1987; Wilson and Herrnstein 1985.

34 In this regard, it is perhaps worth mentioning that we originally intended for this book to be about individual differences generally and social policy, with intelligence as the centerpiece. We narrowed the focus to intelligence partly because it looms so much larger than any other individual trait in explaining what is going on, but also out of necessity: Only for criminal behavior is the scientific literature extensive enough to have permitted a thoroughgoing presentation of individual differences other than intellectual

35 The most serious problem is the established and pronounced tendency of black juveniles to underreport offenses (Hindelang 1978, 1981).

36 Not surprisingly, the most serious offenders are the ones who most often underreport their crimes. Serious offenders are also the ones most likely to go uninterviewed in survey research. At the other extreme, minor offenders brag about their criminal exploits. They inflate the real level of “crime” by putting minor incidents (for example, a school-yard fistfight, which can easily fit the technical definition of “aggravated assault”) in the same category with authentically felonious attacks.

Since we are focusing on the role of intelligence, self-report data pose a special problem, for it has been observed that people of low intelligence are less candid than brighter respondents. This bias would tend to weaken the correlation between IQ and crime in self-report data.

37 The authoritative source on self-report data for juveniles is still Hindelang et al. 1981. See also Hindelang 1978, 1981; Smith and Davidson 1986.

38 Wolfang, Figlio, and Sellin 1972; Wilson and Herrnstein 1985.

39 These results for the entire age range are substantially the same when age subgroups are examined, but some differences may be found. Those who become involved with the criminal justice system at an early age tended to have lower intelligence than those who first become involved later in their teens.

40 This represents the top decile of white males. To use the same index across racial groups is inadvisable because of the different reporting characteristics of whites and blacks.

41 For a review of the literature, see Wilson and Herrnstein 1985.

42 Elliott and Voss 1974.

43 Thornberry et al. 1985 uses the Philadelphia Cohort Study to demonstrate rising crime after dropout for that well-known sample.

44 The sample includes those who got a GED—most of whom had gotten it at the correctional institution in which they were incarcerated at the time of their interview. The results are shown in Appendix 4.

Chapter 12

1 Gove 1964. The definition is listed, sadly, as “obsolete.” We can think of no modern word doing that semantic job now.

2 More recently, Walter Lippmann used civility in his worrying book (Lippmann 1955) about what he feared was disappearing with the rising “Jacobinism” of American political life, the shift he saw early in the century away from representative government toward populist democracy. Early in his career as a journalist and social commentator (Lippmann 1922b), Lippmann noted that the ordinary, private person sets the concerns of governance very low on his or her list of priorities. To govern us, he said, we needed a special breed of person, leaders with the capacity to fathom, and the desire to promote, the public good. That capacity is what he called civility. For a reflection on Lippmann’s conception of civility by a social scientist, see Burdick 1959.

3 There are other rationales for not voting, as, for example, the one promoted on a T-shirt favored by libertarians: “Don’t vote. It only encourages them.”

4 For an attempt to construe voting as a rational act from the economic standpoint, see Downs 1957.

5 Aristotle 1905 ed., p. 1129.

6 Although the sample was not strictly representative of the American population, it was a broad cross-section, unlikely to be atypical except as a result of its underrepresentation of rural and minority children. Hess and Torney 1967.

7 The second graders were excluded from some of the analyses because some questionnaire items evoked too high a rate of meaningless or nonresponses.

8 A measure of political efficacy was based on the children’s “agree” or “disagree” responses to five statements, including: “I don’t think public officials care much what people like me think.” Or, “People like me don’t have any say about what the government does.”

9 Harvey and Harvey 1970.

10 The exceptions included the measures for political efficacy and political participation, both of which were barely correlated with intelligence, although slightly correlated with socioeconomic status (primarily via parental education, rather than family wealth). The authors speculated that the rising cynicism of the young during the later 1960s may in part account for these deviant results.

11 Like other studies (e.g., Neuman 1986, see below), this one also found that the more intelligent someone is, the more likely he or she is to be liberal on social issues and conservative on economic ones. Chauvinistic, militaristic, and anticommunistic attitude were inversely related to intelligence.

12 For a brief summary of this literature as of the late 1960s, see White 1969, who similarly concludes that political socialization, as he calls it, is highly dependent on intelligence itself rather than on socioeconomic status.

13 Sidney Verba and Norman Nie ( 1972), leading scholars of American voting, distinguish cogently between the study of politics as a political scientist approaches it and political psychology. A political scientist mostly wants to understand how political participation shapes the choices a community makes; a political psychologist tries to understand the participation itself. This chapter comes closer to political psychology than to political science.

14 Campbell et al. 1960; Milbrath and Goel 1977; Verba and Nie 1972; Wolfinger and Rosenstone 1980.

15 Wolfinger and Rosenstone 1980, p. 13.

16 Verba and Nie 1972.

17 The one exception, the frequency with which an individual contacted political officials for matters of personal concern, showed no such correlation, but it is also the most ambiguously political. See Verba and Nie 1972.

18 There are hints, however, that, if socioeconomic status had been broken into components of educational level and income, educational level would have predicted political participation better than income. See Figures 6-1 to 6-3 in Verba and Nie 1972.

19 Wolfinger and Rosenstone 1980. In even-numbered years, the CPS, a survey conducted monthly of a nationally representative sample of tens of thousands of Americans, asks about voting in the November election. These surveys also include data on income, occupation, education, and other personal and regional variables. The Wolfinger and Rosenstone analysis was based on the entire sample of almost 100,000 respondents in the November surveys in 1972 and 1974 and a random subsample used for more detailed modeling. The main technIQue they used is the probit analysis, a form of multivariate analysis for estimating the changes in probability of some dependent variable—voting, in this case—associated with a change in an independent variable—educational attainment, for example—after the effects of the other variables—say, income or occupational level—are taken account of.

20 E.g., Peterson 1990.

21 Neuman 1986. This book aggregates data from nine studies of voting between 1948 and 1980 and comes up with a measure of “political sophistication,” which seems to have considerable power in explaining much about voting, including simple turnout. The “key causal factor” for political sophistication, Neuman found, is education, which explained four times as much of the variance in sophistication as the next most influential factor in a list that included age, race, sex, the other components of socioeconomic status, parental behavior, and region of the country.

22 Wolfinger and Rosenstone 1980, p. 19.

23 Besides the works already cited, for other overviews coming to the same basic conclusion, see Campbell et al. 1960; Milbrath and Goel 1977; Neuman 1986.

24 “It is difficult to find support in our data for notions that a generic status variable plays any part in the motivational foundations of the decision to vote” (Wolfinger and Rosenstone 1980, p. 35). Perhaps there is some effect of income on voting at the lowest levels but throughout the range of income, it seems to have no independent predictive value of its own.

25 Verba and Nie 1972, p. 335.

26 How someone votes, rather than whether, can be more plausibly connected to the outward benefits gained from the outcome of an election. And many political scientists focus more on political preference than on level of engagement. Political preferences, too, have their individual correlates, but we will not try to summarize these results as well (but see, for example, Fletcher and Forbes, 1990; Granberg and Holmberg 1990; Milbrath 1977; Neuman 1986; Nie et al. 1976).

27 There is an indirect argument to be made by combining four observations: (1) We know for sure that one of the traits roughly measured by educational attainment is intelligence. (2) As we showed in Chapter 1, American educational opportunities are more efficiently distributed by cognitive ability than they have ever been, here or elsewhere. (3) It is here and now that we see the strongest correlations between voting and educational attainment. (4) In countries where education and cognitive ability are not so thoroughly enmeshed, education has less impact on voting. To fill in the story: During the 1950s and 1960s, the level of political participation rose more rapidly than the educational level of the population (Verba and Nie 1972, p. 252). Looking backward, we see the other side of the same coin. In 1870, only 2 percent of the American population had finished high school; even fewer were going to college. Yet voting rates may have been higher than they are now. Kleppner (1982) concludes that voting rates were more than 11 percentage points above where they should have been, had education had the same effects in the 1880s that they had in 1968. Shortridge (1981) has a lower estimate of voter turnout in the late nineteenth century, but still one that exceeds expectations, given the educational levels of the period. Proper historical comparisons must, of course, take into account changes in voting laws, in poll taxes, in registration requirements, as well as the effects of the extension of suffrage to women and to 18- to 20-year olds. However, after all those corrections are made, scholars agree that past voting rates (post-Civil War, nineteenth century, for example) are incommensurately high or present rates are incommensurately low, given the changes in levels of formal education of the general public. Except in the South of the Reconstruction, the correlation between education and voting rate was negative from 1876 to 1892, just the reverse of what it is now (see Kleppner 1982). The international data indicating that education is less important in voting where education is not so enmeshed with cognitive ability come from Milbrath and Goel (1977).

28 Exposure to political print media was another influential factor, but this, too, turned out to be most strongly associated with rated intelligence (see Luskin 1990).

29 The so-called Bay Area Survey, described in Neuman 1981, 1986.

30 See note 21.

31 Neuman 1986, p. 117.

32 Useful summaries can be found in Abramson and Claggett 1991; Hill and Luttbeg 1983; Kleppner 1982; Peterson 1990; Rothenberg and Licht 1982.

33 E.g., Milbrath and Goel 1977. Biological and social scientists have lately tried to enrich our understanding of “political man” by showing the links to social behavior in other species. For background to the huge literature on the variety of influences on political behavior and attitudes, see Converse 1964; Kinder and Sears 1985; Rokeach 1973.

34 Harvey and Harvey 1970.

35 Neuman 1986. 36. Luskin 1990.

Chapter 13

1 For a useful recent critIQue of the treatment of race by psychologists, also demonstrating how difficult (impossible?) it is to be detached about this issue, see Yee et al. 1993.

2 Lynn 1991c.

3 Lynn 1987a. For a critique of Lynn’s early work, see Stevenson and Azuma 1983.

4 For those who want to reconstruct the debate, Lynn’s 1987 and 1991 review articles followed on earlier studies: Lynn 1977, 1978, 1982; Lynn and Hampson 1986b. For his response to Flynn’s 1987 critique, see Lynn 1987b.

5 Chan and Vernon 1988.

6 Lynn and Song 1994.

7 Iwawaki and Vernon 1988; Vernon 1982.

8 Flynn 1991; Sue and Okazaki 1990.

9 Flynn 1991.

10 Lynn 1993b.

11 Lynn 1987a, 1987b, 1989, 1990a, 1990b, 1991b, 1991c, 1992, 1993a, 1993b; Lynn and Hattori 1990; Lynn, Pagliari, and Chan 1988.

12 Lynn, Hampson, and Iwawaki 1987.

13 Lynn 1991c.

14 Stevenson et al. 1985.

15 Lynn 1991a, p. 733. Lynn has noted that the mean white IQ in Minnesota is approximately 105, well above the average for the American white population. On the other hand, it is possible that the cities chosen in Japan and Taiwan were similarly elevated.

16 An excellent account of the literature may be found in Storfer 1990, pp. 314-321, from which our generalizations are taken. For Jews in Britain, see also Lynn 1992.

17 Storfer 1990, pp. 321-323.

18 As reported in Jensen 1984b, p. 479.

19 Sattler 1988.

20 A detailed and comprehensive review of the literature through 1980 may be found in Osborne and McGurk 1982; Shuey 1966. For an excellent onevolume synthesis and analysis, see Loehlin, Lindzey, and Spuhler 1975.

21 Standard deviations are explained in Appendix 1.

22 To qualify, all studies had to report data for both a white and black sample, with a sample size of at least fifty in each group, drawn from comparable populations that purported to be representative of the general population of that age and geographic area (studies of special populations such as delinquents were excluded). Socioeconomic status posed a special problem. If a study explicitly matched subjects by SES, it was excluded. If it simply drew its samples from a low-SES area, it was included, even though some degree of matching had occurred. The study had to use a standardized test of cognitive ability, although not all of them were IQ tests and not all included a complete battery. If the scores were reported as IQs, a standard deviation of 15 was imputed if no standard deviations for that sample were given.

23 To get the IQ equivalent of SD differences, multiply the SD difference by 15; hence, 1.08 X 15 = 16.2 IQ points.

24 This figure is based on non-Latino whites. The difference between blacks and the combined white-Latino sample in the NLSY is 1.12 SDs. Because the U.S. Latino population was proportionally very small until the 1970s, the NLSY figure for non-Latino whites is more comparable to the earlier tests, in terms of definition of the sample, than the figure for the combined white-Latino sample, and we shall use it exclusively in discussions of the NLSY data throughout the chapter.

25 The formula is Imag, where N is the sample size, X is the sample mean, ς is the standard deviation, and w and b stand for white and black, respectively (taken from Jensen and Reynolds 1982, p. 425). Note that our white sample differs from the one used in Office of the Assistant Secretary of Defense (Manpower) (1982). The “white” sample in that report included all persons not identified as Hispanic or black, whereas our “white” sample also excluded persons identifying themselves as American Indians or a member of an Asian or Pacific ethnic group. The NLSY and the AFQT are described in the Introduction to Part II and Appendix 2.

26 This is a very rough estimate. As of 1994 there were approximately 32.8 million blacks in America. If the estimate is computed based on the mean IQ (86.7) and standard deviation (12.4) of blacks in the NLSY, a table of the normal distribution indicates that only about 0.1 percent, or about 33,000, would have IQs of 125 or higher. If one applies the observed distribution in the NLSY and asks what proportion of blacks are in the top five percent of the AFQT distribution (roughly corresponding to an IQ of 125), the result, 0.4 percent, implies that the answer is about 131,000. There are reasons to think that both estimates err in different directions. We compromised with 100,000.

27 For example, no external evidence for bias has turned up with the WISC, WAIS, Stanford-Binet, Iowa Test of Educational Development, California Achievement Test, SAT, ACT, GRE, LSAT, MCAT, Wonderlic Personnel Test, GATB, and ASVAB (including the AFQT in particular).

28 If any bias has been found, it shows that test scores for blacks often “over--predict” performance; that is, the tests are biased “in favor” of blacks, tending for unknown reasons to predict higher performance than is actually observed. See Appendix 5 for details.

29 Weiss 1987, p. 121. A separate argument, made in Zoref and Williams (1980), adduced evidence that verbal items in IQ tests are disproportionately based on white males “in role-stereotyped representations.” The authors do not present evidence that performance on these items varies by race or gender in ways that would indicate bias but rather indict the tests as a whole on the basis of their sexism and racism.

30 The reason why the “oarsmaniregatta” example has been used so often in descriptions of cultural bias is that it is one of the few items in the SAT that looks so obviously guilty. Perhaps if a test consisted exclusively of items that were equivalent to the example, it would be possible to demonstrate cultural bias statistically, but no modern test has more than a few that come close to “oarsmantregatta.”

31 The definitive assessment of internal evidence of bias is in Jensen 1980.

32 E.g., Valencia and Rankin 1988; Munford and Muñoz 1980.

33 For a review, see Jensen 1980.

34 The NLSY has higher scores for whites than blacks on backward digit span and virtually no difference at all for forward digit span. In a similar way, SES differences within races are also greater for backward digit tests than forward digit tests (Jensen and Figueroa 1975).

35 Gordon 1984. See Farrell 1983, and the attached responses, for an attempt to explain the difference in digit span results through cultural bias hypotheses.

36 Another commonly used apparatus involves a home button and a pair of other buttons, for yes and no, in response to tasks presented by a computer console. The results from both types of apparatus are congruent.

37 The literature is extensive, and we are bypassing which aspect of reaction time in fact covaries with g. For our purposes, it is only necessary that some aspects do so. For some of the issues, see, for example, Barrett, Eysenck, and Lucking 1986; Matthews and Dorn 1989; Vernon 1983; Vernon et al. 1985.

38 Jensen and Munro 1979.

39 Jensen 1993b.

40 The dependent variable is age-equated IQ score, and the independent variables are a binary variable for race (white or black) and the parental SES index. The difference between the resulting predicted IQs is divided by the pooled weighted standard deviation.

41 Among the young women in the RAND study of adolescent pregnancy described in Chapter 8 (Abrahamse et al, 1988), drawn from the nationally representative High School and Beyond sample, the same procedure reduced the B/W difference by 32 percent. See also Jensen and Reynolds 1982 and Jensen and Figueroa 1975.

42 For some people, controlling for status is a tacit way of isolating the genetic difference between the races. This logic is as fallacious as the logic behind controlling for SES that ignores the ways in which IQ helps determine socioeconomic status. See later in the chapter for our views on genetics and the B/W difference.

43 In other major studies the B/W difference continues to widen even at the highest SES levels. In 1975, for example, Jensen and Figueroa (1975) obtained full-scale WISC IQ scores for 622 whites and 622 blacks, ages 5 to 12, from a random sample of ninety-eight California school districts. They broke down the scores into ten categories of SES, using Duncan’s index of socioeconomic prestige based on occupation. They found a B/W discrepancy that went from a mere .13 SD in the lowest SES decile up to 1.20 SD in the highest SES decile. Going to the opposite type of test data, the Scholastic Aptitude Test taken by millions, self-selected with a bias toward the upper end of the cognitive distribution, the same pattern emerged. In 1991, to take a typical year, the B/W difference among students whose parents had less than a high school diploma was .58 SD (averaging verbal and mathematical scores), while the B/W difference among students whose parents had a graduate degree was .78 SD. (National Ethnic/Sex Data for 1991, unpublished data available by request from the College Board). In their separate reviews of the literature, Audrey Shuey (whose review was published in 1966) and John Loehlin and his colleagues (review published in 1975) identified thirteen studies conducted from 1948 through the early 1970s that presented IQ means for low- and high-SES groups by race. In twelve of the thirteen studies, the black-white difference in IQ was higher for the higher-SES group than for the lower-SES group. For similar results for the 1981 standardization of the WAIS-R, see Reynolds et al. 1987. A final comment is that the NLSY also shows an increasing B/W difference at the upper end of the socioeconomic scale when the 1980 AFQT scoring system is used and the scores are not corrected for skew. See Appendix 2 for a discussion of the scoring issues.

44 Kendall, Verster, and Mollendorf 1988.

45 Kendall, Verster, and Mollendorf 1988. For another example, this time of an entire book devoted to testing in the African setting that fails to mention a single mean, see Schwarz and Krug 1972.

46 Lynn 1991c.

47 Boissiere et al. 1985.

48 Owen 1992.

49 Reynolds et al. 1987.

50 Vincent 1991.

51 Vincent also cites two nonnormative studies of children in which the B/W differences ranged from only one to nine points. These are the differences after controlling for SES, which, as we explain in the text, shrinks the B/W gap by about one-third.

52 Jensen 1984a; Jensen and Naglieri 1987; Naglieri 1986. They point out that the K-ABC test is less saturated with g than a conventional IQ measure and more dependent on memory, both of which would tend to reduce the B/W difference (Naglieri and Bardos 1987).

53 Jensen 1993b.

54 Based on the white and black SDs for 1980, the first year that standard deviations by race were published.

55 Wainer 1988.

56 Our reasons for concluding that the narrowing of the B/W differences on the SAT was real, despite the potential artifacts involved in SAT score, are as follows. Regarding the self-selection problem, the key consideration is that the proportion of blacks taking the test rose throughout the 1976-1993 period (including the subperiod 1980-1993). In 1976, blacks who took the SAT represented 10 percent of black 17-year-olds; in 1980, the proportion had risen to 13 percent; by 1993, it had risen to about 20 percent. While this does not necessarily mean that blacks taking the SAT were coming from lower socioeconomic groups (the data on parental education and income from 1980 to 1993 indicate they were not), the pool probably became less selective insofar as it drew from lower portions of the ability distribution. The improvement in black scores is therefore more likely to be understated by the SAT data than exaggerated.

Howard Wainer (1988) has argued that changes in black test scores are uninterpretable because of anomalies that could be inferred from the test scores of students who did not disclose their ethnicity on the SAT background questionnaire (nonresponders). Apart from several technical questions about Wainer’s conclusions that arise from his presentation, the key point is that the nonresponder population has diminished substantially. As it has diminished, there are no signs that the story told by the SAT is changing. The basic shape of the falling trendline for the black-white difference cannot plausibly be affected by nonresponders (though the true means in any given year might well be somewhat different from the means based on those who identify their ethnicity).

57 The range of .15 to .25 SD takes the data in both the text and Appendix 5 into account. To calculate the narrowing in IQ terms, we need to estimate the correlation between IQ and the various measures of educational preparation. A lower correlation would shrink the estimate of the amount of IQ narrowing between blacks and whites, and vice versa for a higher estimate. The two- to three-point estimate in the text assumes that this correlation is somewhere between .6 and .8. If we instead rely entirely on the SAT data and consider it to be a measure of intelligence per se, then the narrowing has been four points in IQ, but only for the population that actually takes the test.

58 A change of one IQ point in a generation for genetic reasons is not out of the realm of possibility, given sufficient differential fertility. However, the evidence on differential fertility (see Chapter 15) implies not a shrinking black-white gap but a growing one.

59 Jaynes and Williams 1989; Jencks and Peterson 1991.

60 Linear extrapolations are not to be taken seriously in these situations. A linear continuation of the black and white SAT trends from 1980 to 1990 would bring a convergence with the white mean in the year in 2035 on the Verbal and 2053 on the Math. And when it occurs, racial differences would not be ended, for if we apply the same logic to the Asian scores, in that year of 2053 when blacks and whites both have a mean of 555 on the Math test, the Asian mean would be 632. The Asian Verbal mean (again, based on 1980-1990) would be 510 in the year 2053, forty-seven points ahead of the white mean. But—such is the logic of linear extrapolations from a short time period—the black Verbal score would by that time have surpassed the white mean by thirty-seven points and would be 500, only ten points behind the Asians. In 2069, the black Verbal mean would surpass the Asian Verbal mean. Linear trends over short periods of time cannot be sensibly extrapolated much into the future, notwithstanding how often one sees such extrapolations in the media.

61 See Appendix 5 for ACT results. In short, the mean rose from 16.2 to in 1986 to 17.1 in 1993. The number of black ACT students also continued to rise during this period, suggesting that the increase after 1986 was not the result of a more selective pool.

62 Chapter 18 explores this line of thought further.

63 SAT trends are subject to a variety of questions relating to the changing nature of the SAT pool. The discussion that follows is based on unreported analyses checking out the possibility that the results reflect these potential artifacts (e.g., changes in the proportion of Asians using English as their first language; changes in the proportion of students coming from homes where the parents did not go to college). The discussion of these matters may be found in Chapter 18.

64 The first year for which a frequency distribution of scores by ethnicity has been published is 1980.

65 Trying to predict trends on the basis of equivalent percentage changes from different baselines is a treacherous proposition. A comparison with black and Asian gains makes the point. For example, the percentage of blacks scoring in the 700s on the SAT-Verbal grew by 23 percent from 1980 to 1990, within a percentage point of the Asian proportional increase. For students scoring in the 600s, the black increase was 37 percent, not far below the Asian increase of 48 percent. The difficulty with using proportions in this instance is that the baselines are so different. Take the case of students scoring in the 600s on the SAT-V, for example. The proportions that produced that 37 percent increase for blacks were eleven students out of a thousand in 1980 versus fifteen students out of a thousand in 1990. The Asian change, put in the same metric, was from fifty-five students in 1980 to eighty-one students in 1990. For every four students per thousand that blacks gained in the 600 group, Asians gained twenty-six per thousand.

66 This statement is based on a calculation that assumes that the 1980 distribution of scores remained the same except for the categories of interest. To illustrate, in 1980, 19.8 percent of black students scored from 200 to 249. In 1993, only 13.1 percent scored in that range. Suppose that we treat the percentage distribution for 1980 as if it consisted of 1,000 students. In that year, 198 of those students scored in the 200 to 249 range. We then recompute the mean for the 1980 distribution, substituting 128 for 198 in the 200 to 249 point category (assigning midpoint values to all the intervals to reach a grouped mean), so in effect we are calculating a mean for a fictitious population of 1000—198 + 128 = 930. (The actual calculations used unrounded proportions based on the actual frequencies in each interval.)

A technical note for those who might wish to reproduce this analysis: When means are computed from grouped data, the midpoint of an interval is not necessarily the actual mean of people in that interval, usually because more than 50 percent of the scores will tend to be found in the fatter part of the distribution covered by the interval but also because scores may be bunched at the extreme categories. In the SAT-Math, for example, a disproportionate number of the people in the interval from 750 to 800 have scores of 800 and of those in the interval from 200 to 249 have scores of 200 (because they guessed wrong so often that their score is driven down to the minimum). Such effects can produce a noticeable bias in the estimated mean. For example, the actual verbal mean of black students in 1980 was 330. If one computes the mean based on the distribution published annually by the College Board, which run in fifty-point intervals from 200 to 800, the result is 336.4. The actual mean in 1990 was 352; the grouped mean is 357.9. The computed figure in the text is based on the surrogate mean as described above compared to the grouped 1980 and 1990 means, to provide a consistent framework.

67 The contrast with the Asian experience on the SATs is striking. The Asian Math mean rose from 509 to 535. Of this increase, none of it was due to decreases in students scoring less than 200 (compared to 22 percent for blacks), while a remarkable 54 percent was due to gains in the 700 and up group (compared to 3 percent for blacks). Meanwhile, on the Verbal test, the Asian mean rose from 396 to 415 from 1980 to 1993. Of this, only 17 percent occurred because of reductions in Asians scoring in the 200s (compared to 51 percent for blacks), while 9 percent occurred because of increases in Asians scoring in the 700s (compared to 0.4 percent for blacks). The Asian increase in test scores has been driven by improvements among the best students, while the black increase has been driven by improvements among the worst students. We are unable to find any artifacts in the changing nature of the black and Asian SAT pools that would explain these results. The continued Asian improvement makes it difficult to blame the slowdown in black improvement in the last decade on events that somehow made it impossible for any American students to make progress. Explanations could be advanced based on events specific to blacks.

68 Snyderman and Rothman 1988. The sample was based on random selections from the Members and Fellows of the American Educational Research Association, National Council on Measurement in Education, six divisions of the American Psychological Association (Developmental Psychology, Educational Psychology, Evaluation and Measurement, School Psychology, Counseling Psychology, and Industrial and Organizational Psychology), the Behavior Genetics Association, the Cognitive Science Society, and the education division of the American Sociological Association.

69 Brody 1992, p. 309.

70 Gould 1984, pp. 26-27.

71 Gould 1984, p. 32. See Lewontin, Rose, and Kamin 1984, p. 127, for a similar argument.

72 Gould 1984, p. 33.

73 The ramifications for public policy are dealt with in detail in Chapters 19 and 20, concerning affirmative action.

74 We do not include in the text any discussion of Phillipe Rushton’s intensely controversial writings on the differences among Asian, white, and black populations. For a brief account, see Appendix 5.

75 A similar example can be found in Lewontin 1970, one of the most outspoken critics of the IQ enterprise in all its manifestations.

76 The calculation proceeds as follows: The standard deviation of IQ being 15, the variance is therefore 225. We are stipulating that environment accounts for .4 of the variance, which equals 90. The standard deviation of the distribution of the environmental component of IQ is the square root of 90, or 9.49. The difference between group environments necessary to produce a fifteen-point difference in group means is 15/9.49, or 1.58, and the difference necessary to produce a three-point difference is 3/9.49, or .32. The comparable figures if heritability is assigned the lower bound value of .4 are 1.28 and .26. If heritability is assigned the upper-bound value of .8, then the comparable figures are 2.24 and .45.

77 Stevenson et al. 1985.

78 Lynn 1987a.

79 Frydman and Lynn 1989.

80 Iwawaki and Vernon 1988; McShane and Berry 1988.

81 Vernon, 1982 p. 28. It has been argued that the 110 figure is too high, but a verbal-visuospatial difference among Asian Americans is not disputed (Flynn 1989).

82 Supplemental evidence has been found among Chinese students living in China who were given the SAT Several hundred Chinese students in Shanghai between the ages of 11 and 14 scored extremely high on the Math SAT, despite an almost total lack of familiarity with American cognitive ability testing. As a proportion of the total population, this represented a far greater density of high math scorers in Shanghai than in the United States. Further attempts to find high scorers in Chinese schools confirmed the original results in Shanghai (Stanley, Feng, and Zhu 1989).

83 The SAT data actually provide even more of a hint about genetic origins for the test-score pattern, though a speculative one. The College Board reports scores for persons whose first language learned is English and for those whose first language is “English and another.” It is plausible to assume that Asian students whose only “first language” was English contain a disproportionate number of children of mixed parentage, usually Asian and white, compared to those in whose homes both English and an Asian language were spoken from birth. With that hypothesis in mind, consider that the discrepancy between the Verbal and Math SATs was (in IQ points) only 1.7 points for the “English only” Asians and 5.3 points for the “English and another” first-language Asians. Nongenetic explanations are available. For example, one may hypothesize that although English and another language were both “first languages,” English wasn’t learned as well in those homes; hence the Verbal scores for the “English and another” homes were lower. But then one must also explain why the Math scores of the “English and another” Asians were twenty-one SAT points higher than the “English-only” homes. Here one could hypothesize that the “English-only” Asians were second- and third-generation Americans, more assimilated, and therefore didn’t study math as hard as their less assimilated friends (although somehow they did quite well in the Verbal test). But while alternative hypotheses are available, the consistency with a genetic explanation suggests that it would be instructive to examine the scores of children of full and mixed Asian parentage.

84 A related topic that we do not review here is the comparison of blacks and whites on Level I and Level II abilities, using Jensen’s two-level theory of mental abilities (Jensen and Figueroa 1975; Jensen and Inouye 1980). The findings are consistent with those presented under the discussion of WISCR profiles and Spearman’s hypothesis.

85 “Spearman’s hypothesis” is named after an observation made by Charles Spearman in 1927. Noting that the black-white difference varied systematically for different kinds of tests, Spearman wrote that the mean difference “was most marked in just those [tests] which are known to be most saturated with g” (Spearman 1927, p. 379). Spearman himself never tried to develop his comment into a formal hypothesis or to test it.

86 Jensen and Reynolds 1982.

87 Jensen and Reynolds actually compared large sets of IQ scores with the full-scale IQ score held constant statistically.

88 Jensen and Reynolds 1982, p. 427; Reynolds and Jensen 1983.

89 Jensen and Reynolds 1982, pp. 428-429.

90 Jensen 1985, 1987a.

91 Jensen 1993b.

92 Braden 1989.

93 Jensen 1993b.

94 The correlations between g loading and black-white difference are typically in the .5 to .8 range.

95 A concrete example is provided by the Kaufman Assessment Battery for Children (K-ABC), a test that attained some visibility in part because the separation between black and white children on it is smaller than on more standard intelligence tests. It was later found that K-ABC is a less valid measure of g than the standard tests (Jensen 1984a; Kaufman and Kaufman 1983; Naglieri and Bardos 1987).

96 E.g., Pedersen et al. 1992. Jensen limits himself to discussing Spearman’s hypothesis on the phenotypic level.

97 Jensen 1977.

98 Some other studies suggest a systematic sibling difference for national populations, but it goes the other way: Elder siblings outscore younger siblings in some data sets. However, this “birth-order” effect, when it occurs at all, is much smaller than the effect Jensen observed.

99 Jensen 1985, 1987a.

100 Various technical arguments were advanced against Jensen’s claim that blacks and whites differ the most on tests that are the most highly loaded on g. Many of these were effectively resolved within the forum. One critic hypothesized that Jensen’s findings resulted from an artifact of varying reliabilities (Baron 1985). Jensen was able to demonstrate that corrections for unreliability did not wash out the evidence for Spearman’s hypothesis and that some of the tests with low g loadings had high reliabilities to begin with, contrary to the critic’s assumption. Another commentator suggested that Jensen had inadvertently built into his own analysis the very correlation between gloading and black-white difference that he purported to discover (Schonemann 1985; see also Wilson 1985). In the next round (the forum occupied two issues of the journal), after being apprised of a response by physicist William Shockley (Shockley 1987), he withdrew his argument. A less serious criticism suggested that black-white differences did indeed correlate with some general factor that turns up to varying degrees in different intelligence tests but that the factor may not be g (Borkowski and Maxwell 1985). To this criticism, Jensen was able to demonstrate that the g factor accounted for so large a fraction of the total variance in test scores that no other general factor could possibly be comparably correlated with black-white differences. A still less serious criticism (indeed, barely a criticism at all), made by several commentators, was that the g that turns up in one battery of tests is likely to differ from the that turns up in another (e.g., Kline 1985). Jensen accepted this point, noting, however, that the various g’s are themselves intercorrelated.

A number of critics took a nontechnical tack. One set argued that Jensen’s analysis was conceptually circular. For example, if g is defined as intelligence, then tests that are loaded on g will be considered tests of intelligence. If these happen, coincidentally, to be the tests that black and whites differ on, then Spearman’s hypothesis will seem to be confirmed, though the link between the tests and intelligence was simply postulated, not proved (Brody 1987). For a related argument see Macphail 1985. Jensen acknowledged that he had not tried to discuss the relationship of g to intelligence in this particular article. Another set of critics made what could be called meta-critical comments, wondering why Jensen should want to uncover relationships that are not very interesting (Das 1985), hurtful to blacks (Das 1985), inimical to world peace (Bardis 1985), and likely to distract attention from the possibility of raising people’s g by educational means (Whimbey 1985). None of these commentaries disputed that the data show what Jensen said they show.

A few years later, the last paper written by the noted psychometrician, Louis Guttman, before his death, attempted to demonstrate a mathematical circularity in Jensen’s argument, concluding that Spearman’s hypothesis is true by mathematical necessity (Guttman 1992). He argued that the factor analytic procedures that are used to extract an estimate of g cannot fail to produce a correlation between g and the B/W difference. If the correlation is present by necessity, concluded Guttman, it can’t be telling us anything about nature. The gist of Guttman’s case is that if g is the only source of correlation across tests, then the varying B/W differences across tests must be correlated with g.Jensen and others were quick to point out that no one now believes that gis the only source of correlation between tests, just the largest one. We will not try to reproduce Guttman’s mathematical argument, not just because it would get us deep into algebra but because it was decisively refuted by other psychometricians who commented on it and seems to have found no other support since its publication. See Jensen 1992; Loehlin 1992; Roskam and Ellis 1992.

101 Gustafsson 1992.

102 Mercer 1984, pp. 297-310.

103 Mercer 1988.

104 Mercer 1988, p. 209.

105 It would be useful for the reader if we could present Mercer’s results so that they parallel the method we have been using, in which the socio-cultural variables and ethnicity are treated as independent variables predicting IQ, but her presentation does not include that analysis.

106 Mercer 1988, p. 208.

107 The critique of Mercer’s position has been highly technical. Readers who have the patience will find an extended exchange between Mercer, Jensen, and Robert Gordon in Reynolds and Brown 1984.

108 Mercer 1984, Tables 6, 9; Jensen 1984b, pp. 580-582.

109 Boykin 1986, p. 61.

110 For review, see Boykin 1986.

111 Ogbu 1986.

112 Flynn 1984, 1987a, 1987b.

113 Merrill 1938.

114 Flynn 1984, 1987b; Lynn and Hampson 1986c.

115 Flynn 1987a, 1987b.

116 Lynn and Hampson 1986a.

117 Teasdale and Owen 1989.

118 For evidence that this is what has happened in the United States, see Murray and Herrnstein 1992.

119 If the mean IQ in 1776 had been 30 and the standard deviation was what it is today, then America in the Revolutionary period had only five men and women with IQs above 100.

120 Lynn and Hampson 1986a.

121 Consider the analogy of height. The average stature of Americans has risen several inches since the Pilgrims landed at Plymouth, but height has run in families nevertheless.

122 A shifting link between IQ and intelligence is not only possible but probable under certain conditions. For example, when the literacy level of a country rises rapidly, scores on conventional intelligence tests will also rise because more people will be better able to read the test. This rise is unlikely to be fully reflected in a rising intelligence level, at least with equal rapidity. Flynn 1987b discusses this general measurement issue.

123 Scarr and Weinberg 1976, 1978, 1983; Weinberg, Scarr and Waldman 1992.

124 Weinberg, Scarr, and Waldman 1992, Table 2. The progression of the IQ means from two black parents to one black/one white to two white parents is not as neatly supportive of a genetic hypothesis as might first appear, because there is reason to suspect that the mixed-race biological parents of the adopted children were disproportionately drawn from college students, which in turn would imply that the IQ of the black parent was well above the black mean.

125 Weinberg, Scarr, and Waldman 1992. For the technical debate, see Levin in press; Lynn in press, with a response by Scarr and Weinberg in Waldman, Weinberg, and Scarr in press.

126 Weinberg, Scarr, and Waldman 1992, Table 2. The overall decline in scores for all groups was because a new test norm had been imposed in the interim, vitiating the Flynn effect for this group.

127 Waldman, Weinberg, and Scarr in press.

128 Eyferth 1961 For accounts in English, see Loehlin, Lindzey, and Spuhler 1975; Flynn 1980.

129 Loehlin, Lindzey, and Spuhler 1975, Chap. 5.

130 An earlier study showed no significant association between the amount of white ancestry in a sample of American blacks and their intelligence test scores (Scarr et al. 1977). If the whites who contributed this ancestry were a random sample of all whites, then this would be strong evidence of no genetic influence on black-white differences. There is no evidence one way or another about the nature of the white ancestors.

131 Lewontin, Rose, and Kamin 1984.

132 Scarr and Weinberg 1976, Table 12.

Chapter 14

1 U.S. Department of Labor 1993, Table 3.

2 U.S. Bureau of the Census 1993, Table 1.

3 The NLSY sample does not include GEDs. Nationally, the 1991 high school completion rate (signifying twelve years of school) was 87.0 percent for whites, 72.5 percent for blacks, and 55.4 percent for Latinos (National Center for Education Statistics 1993, p. 58).

4 These results refer to a logistic analysis in which the dependent variable was a binary variable representing obtaining a normal high school diploma. The independent variables were age and IQ.

5 For persons ages 25 to 29 in 1992, the proportions with bachelor’s degrees were 26.7 percent for whites, 10.6 percent for blacks, and 11.4 percent for Latinos (National Center for Education Statistics 1993, p. 62).

6 Welch 1973.

7 For example, given the mean years of education for people entering the high-IQ occupations defined in Chapter 3 ( 16.6) and holding age constant at the mean, the probability that whites would be in a high-IQ occupation was 14-4 percent compared to 12.8 percent for blacks and 18.1 percent for Latinos.

8 Gottfredson 1986.

9 Gottfredson 1986 leaves room for the possibility that blacks at the upper end of the IQ distribution were disproportionately choosing medicine, engineering, or the other professions she happened to examine. Perhaps if she had examined other high-IQ occupations (one may hypothesize), she would have found blacks represented at or below expectations. Our analysis, incorporating a broad range of high-IQ occupations, makes this hypothesis highly unlikely. The extension of the analysis in Chapter 20 rules it out altogether.

10 The proportions in high-IQ occupations were 5.8 percent for whites, 3.1 percent for blacks, and 3.7 percent for Latinos.

11 After controlling for IQ, the unrounded proportions in high-IQ occupations were 10.4 percent for whites, 24.5 percent for blacks, and 16.2 percent for Latinos.

12 “Year round” is defined as people who reported being employed for fifty-two weeks in calendar 1989 and reported wage income greater than O (excluding a small number who apparently were self-employed and did not pay themselves a wage).

13 This result is based on a regression analysis when the wage is the dependent variable, age is the independent variable, and the analysis is run separately for each race. The figures reported reflect the mean for a black and white of average age in the NLSY sample.

14 For a more detailed technical analysis of the NLSY experience, reaching the same conclusions, see O’Neill 1990. O’Neill’s collateral findings about the joint role of education and IQ are taken up in Chapter 19.

15 U.S. Bureau of the Census 1993, Table 29.

16 Precisely, 64.4 percent higher, computed using unrounded poverty rates.

17 For various approaches, see Bianchi and Farley 1980; Jargowsky 1993; Massey and Eggers 1990; Smith and Welch 1987, Eggebeen and Lichter 1991. For a summary of the literature, see Jaynes and Williams 1989.

18 U.S. Department of Labor 1993, Table 3.

19 For civilian males not in school and not prevented from working by health problems.

20 Wilson 1987, Lemann 1991, Holzer 1986; Kasarda 1989; Topel 1993, Jaynes and Williams 1989.

21 The proportions in 1960 were 66 percent (blacks) and 72 percent (whites). Computed from Tables 1 and 16, National Center for Health Statistics 1993, and comparable tables in earlier editions.

22 William Julius Wilson is best known for the lack-of-marriageable-males thesis (Wilson 1987), which is currently thought to have some explanatory power (like IQ) but leaves the bulk of the discrepancy unexplained (as does IQ). See South 1993; Fossett and Kiecolt 1993; Bulcroft and Bulcroft 1993; Schoen and Kleugel 1988; Lichter, LeClere, and McLaughlin 1991. For other empirical work bearing on the thesis, see Bennett, Bloom, and Craig 1989; Tucker and Taylor 1989; South and Lloyd 1992; Spanier and Glick 1980; Staples 1985.

23 National Center for Health Statistics, 1993, Table 26. Figures in the text are for live births.

24 E.g., Anderson 1989; Bumpass and McLanahan 1989; Duncan and Laren 1990; Ellwood and Crane 1990; Furstenberg et al. 1987; Hogan and Kitagawa 1985; Lundberg and Plotnick 1990; Murray 1993; Rowe and Rodgers 1992; Teachman 1985.

25 Computed from Committee on Ways and Means and U.S. House of Representatives 1993, pp. 688, 697; SAUS 1993, Table 23.

26 These figures, already high, are even higher when the analysis is limited to mothers. The percentages of mothers who had ever been on welfare for blacks, Latinos, and whites, were 65.0, 40.5 and 21.8, respectively. We conducted parallel analyses limited to women who had borne a child prior to 1986, giving at least five years’ “chance” for a woman to show up on the AFDC roles. This had the predictable effect of slightly increasing the percentages of women who had ever received AFDC, but yielded the same substantive conclusions.

27 Intergenerational transmission has some role. See McLanahan and Bumpass 1988; McLanahan 1988. For other discussions touching on racial differences in welfare recipiency, see An, Haveman, and Wolfe 1990; Bernstam and Swan 1986; Bianchi and Farley 1980; Donnelly and Voydanoff 1991; Duncan and Hoffman 1990; Hirschl and Rank 1991; Hofferth 1984; Hogan, Hao, and Paush 1990; Honig 1974; Hutchens, Jackson, and Schwartz 1987; Smith and Welch 1989; Wiseman 1984, Hoffman 1987; Rank 1988; Zabin et al. 1992.

28 National Center for Health Statistics 1993, Table 26.

29 Based on the Colorado Interuterine Growth Charts.

30 For discussions of reasons for the black-white gap in low-birth-weight babies see David 1990; Kempe et al. 1992; Mangold and Powell-Griner 1991.

31 U.S. Bureau of the Census 1993, Table 3. The Bureau of the Census does not break out “non-Latino whites” in the official statistics. If one assumes that all persons labeled as “Hispanic origin” were white, then 12.9 percent of non-Latino white children were under the poverty line. This is an underestimate for the actual figure, since many persons of Hispanic origin are classified as black. The figure of 14 percent in the text is an estimate that attempts to compensate roughly for the underestimate.

32 The reasons for the gap in black and white child poverty are discussed in the same literature that deals with differences in marriage rates and illegitimacy, which together account for much of the differing financial situations facing black and white mothers of young children.

33 Various approaches to ethnic differences in home environment are Heath 1982; Bardouille-Crema, Black, and Martin 1986; Field et al. 1993; Kelley, Power, and Wimbush 1992; McLoyd 1990; Moore 1985; Pearson et al. 1990; Radin 1971; Tolson and Wilson 1990; Wasserman et al. 1990. A useful older account is Davis and Havighurst 1946.

34 See Jones 1992 on abortion, Abramson and Claggett 1991 on voting, and Elliott and Ageton 1980 on delinquency.

35 See the references (note 33) regarding ethnic differences in home environment.

36 Refers to arrests for index crimes in 1992 relative to the size of the black and white populations. Computed from Federal Bureau of Investigation 1993, Table 43, and SAUS 1993, Table 22. See also Wilson and Herrnstein 1985, Chap. 18.

37 U.S. Bureau of the Census 1993b, Table 305.

38 R. Gordon 1976, 1987.

39 We cannot use the NLSY self-report data for inter-racial comparisons. Self-report crime measures have consistently revealed marked differences in the willingness of black and white youths to disclose crimes. See Elliott and Ageton 1980; Hindelang 1981; Hindelang, Hirschi, and Weis 1981.

40 See the sixteen studies reviewed in Osborne and McGurk, 1982. See also the results from the Philadelphia delinquency cohort (Wolfgang, Figlio, and Sellin 1972).

Chapter 15

1 We would, of course, need to know something about the fathers’ scores too. The more complete account comes later in the chapter.

2 Also see Ghiselin and Scudo 1986; Ingle 1973.

3 Soloway 1982.

4 Francis Galton’s coined the term eugenic. See Galton 1883.

5 The eugenicists were active, but, as we noted in the Introduction, the intelligence testers were not. For an account of what happened prior to the passage of the xenophobic and nativist Immigration Restriction Act of 1924 and how it has gotten distorted in the retelling, see Snyderman and Herrnstein 1983.

6 “Intrinsic birth rates” are birth rates corrected for age distributions. Death rates also decline during the demographic transition, but they will not be discussed in any detail here. Demographers generally believe that differential death rates cease to be a major factor in population growth in modernized societies like ours. This is a supposition that needs to be reassessed, given the probable differential impact of infant mortalities, homicide rates, and AIDS in relation to tested intelligence. Of all the studies we summarize below, only Retherford and Sewell 1988 takes mortality rates into account, but it did not have a nationally representative sample to analyze. We may surmise that the intergenerational decline in intelligence is being mitigated somewhat by differential intrinsic death rates.

7 Retherford 1986; Retherford and Sewell 1988; Vining 1986; Wrong 1980.

8 Retherford 1986; Retherford and Sewell 1988.

9 Becker 1981.

10 E.g., Retherford and Sewell 1988; Rindfuss, Bumpass, and John 1980.

11 Vining 1982a, Vining 1986.

12 Vining 1986.

13 For a sampling of studies that indicate the importance of attitudinal variables for motherhood in many nations, see Booth and Duvall 1981; Hass 1972; Krishnan 1990; Mason and Palan 1981; Youssef 1978.

14 Estimating the phenotypic, as distinguished from the genotypic, change in intelligence across generations is conceptually little more than a matter of toting up the population yielded across the distribution of intelligence, then aggregating the subtotals to get the overall distribution of scores in the next generation, after first taking account of regression to the mean (Andrews 1990; Falconer 1966; Retherford and Sewell 1988). It is not necessary to include any estimate for the heritability of intelligence. This simplicity in conception should not be confused with simplicity in actually making these calculations. Parents in, say, successive deciles of intelligence may have differing intrinsic rates of population growth (or decline) because of varying lifetime fertilities, varying ages at reproduction, and varying mortality rates. Assortative mating by the parents (see Chapter 4) matters in calculation only insofar as it influences the correlation between parents and children. Hence, if fertility is lower at higher levels of intelligence, then assortative mating for intelligence will speed the decline of the population intelligence because it increases the correlation between parents and children. Some of the studies that we cite focus on the genotypic decline rather than the phenotypic (e.g., Retherford and Sewell 1988). Since children resemble the parents who rear them for environmental reasons as well as genetic, the population phenotype will change more rapidly than the population genotype.

15 The best review of the early studies is Anastasi 1956. See also Duncan 1952; Olneck, Wolfe, and Dean 1980; Retherford and Sewell 1988; VanCourt and Bean 1985; Vining 1986.

16 Cattell 1936, Cattell 1937.

17 Retherford and Sewell 1988.

18 Cook, 1951 p. 6.

19 As Osborn and Bajema (1972) stated, “The distribution of births in an industrial welfare-state democracy would become more eugenic as the environment improved with respect to health, educational, and occupational opportunities, and particularly with respect to the spread of birth control to the point where freedom of parenthood became a reality for all citizens” (p. 344). The Eugenic Hypothesis was first stated in Osborn 1940.

20 Maxwell 1954; Scottish Council for Research in Education 1949.

21 Cattell 1951. See also Tuddenham 1948.

22 Higgins, Reed, and Reed 1962.

23 Bajema 1963, 1971; Olneck, Wolfe, and Dean 1980; Waller 1971. In addition, as we explained in Chapter 13, the Flynn Effect would have masked any decline in IQ by demographic processes.

24 Cattell 1974; Osborne 1975.

25 Retherford and Sewell 1988.

26 Vining 1982b.

27 VanCourt and Bean 1985.

28 Retherford and Sewell 1988.

29 Ree and Earles 1991a.

30 The simplest way to get around the estimates that scholars have derived would be to measure the IQs of successive generations, following parents and their children, but surprisingly few studies of any size measure cognitive ability in both parents and children, and those few have always been small studies conducted for specific purposes; none has met the crucial criterion of national representativeness. In the United States, the NLSY has the potential to yield such estimates, if the study continues long enough, because it has already initiated a program of testing the children of the NLSY mothers. As of now, however, it provides no interpretable data about the national population as a whole. The women of the NLSY are only partway through their childbearing years (ages 25 to 33 as of our last observation), and the children of the sample are atypical in that they were disproportionately born to young mothers, who may differ in their child-rearing practices from older mothers. The sample is still missing altogether many of the children of women who delay childbearing, who in turn are disproportionately women with advanced education—and high IQs. We can use the mother-child testing data to extract a few clues about ethnic differences, described later in this chapter.

31 See Chapter 17.

32 Not everyone agrees that it is worrisome. In a recent contribution to the fertility debate, Samuel Preston and Cameron Campbell (1993) challenge the premise that negative differential fertility on the microlevel must mean falling national intelligence on the macrolevel. Such negative differentials are compatible, they argue, with a constant, improving, or deteriorating intelligence distribution in the population as a whole. It all depends on how the current differentials relate to past and future fertility patterns. The argument is densely mathematical, and neither the article nor the two accompanying commentaries lend themselves to easy summary. Interpreting the argument is complicated by the fact that the authors operationalized their model with one of the only data sets in which the fertility differential is not negative. However, the narrowest mathematical implication of their model remains accurate: It is possible to postulate conditions that produce a constant or even rising IQ in the face of negative fertility differentials. There is no reason to suppose that those special conditions prevail now or have in the recent past. James Coleman (1993) similarly points out in his commentary that these hypothetical conditions do not have much to do with what is known about the history of fertility, concluding that “their rejection of the common belief about the effect of fertility differences is not warranted. What they have done is not to answer the questions involved, but to frame the problem in a most useful way” (p. 1032).

33 A population has a limited number of ova and an unlimited number of sperm. Therefore, what matters for replacement (net of migration) is how many females are born and what their fertilities are. Hence, since slightly more than 50 percent of births are males and since a few of the females do not reach the age of reproduction, the average woman needs to have approximately 2.1 births to attain replacement fertility.

34 Sweet and Rindfuss 1983, Fig. 2. Other countries similarly show the impact of education on fertility. A study of Mexican women in which urbanization, occupation, migration, and education were examined for their effects on fertility found that education was the main depressant. See Pick, Butler, and Pavgi 1988.

35 Based on completed fertility for women ages 35 to 44 in the Bureau of the Census’s Current Population Survey, a nationally representative sample, in June 1992 (Bachu 1993, Table 2). The mean IQ represents the aggregated means by educational level. This calculation assumes that the mean IQ of women at various educational levels is the same for women born from 1948 to 1957 (the national sample represented in the figure on page 349) as it was for the NLSY women born from 1957 to 1964. Is this plausible? Women born from 1948 to 1957 graduated from high school from 1966 to 1975, after the percentage of students finishing high school had hit its peak, after the major shifts in educational recruitment to college had already changed for whites, and after aggressive affirmative action had begun for blacks and to some extent for Latinos. We can think of no reason to assume that the mean IQ of NLSY women (born from 1957 to 1964) at different levels of educational attainment was systematically different than for the cohort of women born from 1948 to 1957, though it could have been.

36 The data report the education of the mother at the time she has a child, but a very young mother may later go back to finish high school, and a woman with a bachelor’s degree may return for a master’s or a Ph.D. In ascribing IQs based on educational attainment, it is important to base them on the final attainment, not just on the years of education at the time of birth. Our procedure for doing so was as follows: Using the NLSY, we first established the difference between education at the time of birth and education as of 1990, when the youngest woman in the NLSY was reaching 26. In the first version of our procedure, it was assumed that the proportion of women who gave birth at ages 26 to 33 (the age range of 98 percent of NLSY women by the 1990 interview) who would subsequently move into a new educational category (the categories were 0-11, 12, 13-15, 16, and 17 or more years of education) was extremely small. We then computed an adjusted version of the table showing births by age by race in National Center for Health Statistics 1993, Table 20, assuming eventual educational attainment equal to that observed in the NLSY (for example, 36.1 percent of NLSY women who had ten years of education when they first gave birth reported twelve years of education by 1990; we recomputed the NCHS cell assuming that 36.1 percent of the women in the NCHS figures who were shown as having ten years of education would eventually get twelve). We then used the adjusted matrix of births by age by race to estimate IQs, using the NLSY mean IQs for women with equivalent years of education. Note that this computation must be done using separate estimates by race, because of the large discrepancy between the IQs of blacks and whites of equivalent years of education. This first: iteration yielded an estimated mean IQ of mothers for the 1991 U.S. birth cohort of 97.9. We then repeated the process, using a sample limited to births that occurred by the end of 1986, meaning that each mother had at least four years of postbirth observation to see if she went back to school. This version avoided the assumption that women ages 26 and over seldom go back to school, at the cost of reducing sample sizes and perhaps introducing some unrepresentativeness into the truncated sample. The estimated IQ for the mothers of 1991 U.S. birth cohort using this procedure was 98.0.

37 The actual figure, based on all births through 1990, was 95.7. It is produced by taking the mean (using sample weights as always) of the IQ associated with the mother of each child born to an NLSY mother.

38 Out of every 100 women ages 30 to 34 in 1990, only 2 had their first birth that year; after age 34, the proportion fell rapidly to near zero. See Bachu 1991, Table 4. We realize that many readers know personally of numerous women who had their first babies in their late thirties. It is one more useful example of the difference between the world in which most of our readers live and the rest of the country.

39 Women of the NLSY who had reached ages 32 to 33 may be expected to have borne about 83 percent of all the babies they will ever bear (interpolated from National Center for Health Statistics 1991, Table 2).

40 The biases will understate the age differential by cognitive class because (based on known patterns of childbearing by women of different educational groups) the largest change in the final mean age of births will occur among the brightest women.

41 Bachu 1993, Table 2.

42 This finding echoes points made in other places. We showed earlier (see Chapter 8) that it is not IQ per se that depresses fertility but the things that a higher IQ results in, such as more education (see Retherford and Sewell 1989; Rindfuss, Morgan, and Spicegood 1980). At given IQ scores, blacks get more schooling than either whites or Latinos (Chapters 13,18). Hence we should not be surprised that, at given IQ scores, blacks have lower fertility than either of the other groups; they are more likely to be still in school.

43 Rindfuss, Morgan, and Spicegood 1980; Osborne 1973; Chen and Morgan 1991b.

44 Chen and Morgan 1991a; Rindfuss, Morgan, and Spicegood 1988.

45 The quotation is taken from Baker and Mott 1989, p. 24.

46 To mention just one of the most important reasons to hedge, the participation of Latino mothers in the NLSY testing program was comparatively low, making the white-Latino comparison quite tentative. And as we cautioned in Chapter 14, the PPVT is probably less valid for Latinos than for other groups. This may bear on the comparison between Latino-white differences among mothers and among children. In any case, the figure for the apparent dysgenic effect for the Latino-white comparison is small enough to deter strong conclusions.

In contrast, the black-white apparent dysgenic effect is large, and we examined it using several methods to see if it might be spurious. The table on page 356 reports the results using the children’s sample weights, and comparing tested children with the mothers of those children, counting a mother more than once if she had more than one child and counting the same child more than once if he or she had been tested in more than one year (after turning 6). If we repeat the same calculation but including all children who were tested (including those under the age of 6), the black-white difference among the mothers is 13.9 points, compared to a difference among the children of 20.0 points, an even larger dysgenic difference than the one produced by the children ages 6 and older. Another approach is to discard the sample weights (which are problematic in several respects, when comparing across test years) and instead restrict the sample to children born to mothers who were in the cross-sectional NLSY sample. Doing so for all children who took the PPVT after the age of 6 produces a B/W difference of 14.8 points for the mothers and 18.1 points for the children, or a dysgenic difference of 3.3 points. Doing so for all children who took the PPVT produces a B/W difference of 14.9 points for the mothers and 19.4 for the children, or a dysgenic difference of 4.5 points.

Our next step was to examine separately the results from the three test years (1986, 1988, and 1990). For the children who were 6 or older when they took the test (which again shows a smaller difference than when the test includes all children), the B/W differences for the three test years, using sample weights, were 5.9, 1.9, and 3.0 points, respectively. The differences across test year did not affect the conclusion that a significant dysgenic effect exists, but the reasons for the differences are worth investigating.

In our attempt to see whether the dysgenic effect could be attenuated, we repeated all of these analyses with one difference: Instead of using the national norms for the PPVT (normed to a mean of 100 and SD of 15), we let the NLSY children be their own reference group, comparing the black and white scores using the observed mean and standard deviation for all NLSY children who took the test. This procedure reduces the estimate of the dysgenic effect. For example, the results, using sample weights, for the children who were 6 and older, showed an increasing B/W gap of 1.9 points instead of the 3.9 points produced by using the national norms. The difficulty in interpreting this finding is that the procedure itself has no good rationale. The PPVT national norms seem to have been properly determined. If anything, the Flynn effect should mean that the NLSY children, taking the test anywhere from seven to eleven years after the norms were established, should have a 2- to 3-point IQ edge when compared to the national norms. So we have no reason to think that the lower estimate is the correct one, but it does represent the best way we could concoct to minimize the B/W dysgenic effect.

Finally, we explored how the births to NLSY women might affect these findings by comparing black and white women who had not borne a child as of 1990. The mean IQ for the childless white women was 106.6, compared to 100.3 for childless black women. That black women without children have a mean of 100 is in itself striking evidence of the low fertility among the top part of the black IQ distribution, but even if subsequent fertility for the two groups is the same, the B/W gap in the next generation will presumably continue to diverge as the NLSY women complete their fertility.

47 New York Times. “Slighting words, fighting words.” Feb. 13, 1990, p. A24.

48 The computation in the text counts each mother as many times as she had children who were tested. If instead each mother is counted only once, the white-black difference among mothers is 1.12 SDs. The white-Latino difference is 1.05 SDs.

49 Auster 1990; Bouvier 1991; Gould 1981; Simon 1989; Wattenberg 1987; Wattenberg and Zinsmeister 1990.

50 Holden 1988.

51 E.g., Higham 1973; Lukacs 1986.

52 Simon 1989. For a symposium, see Simon et al. 1993.

53 Auster 1990, and various contributors in Simon et al. 1993.

54 Bouvier and Davis 1982. This particular estimate is based on annual immigration of 1 million.

55 The figures for the 1950s, 1960s, and 1970s were 11 percent, 16 percent and 18 percent respectively. SAUS 1992, Table 14 (SAUS 1971, Table 4).

56 Lynn 1991.

57 SAUS 1992, Table 8. The figures also includes once-illegal immigrants who were granted permanent residence under the Immigration Reform and Control Act of 1986.

58 Sowell 1981.

59 A first, elementary consideration is that the NLSY data refer almost exclusively to the children of the adults who decided to immigrate. Whatever self-selection for IQ might have existed in the elders will be less visible in their offspring.

60 Carliner 1980; Chiswick 1978; Gabriel 1991.

61 Borjas 1987. Borjas’s formulation also draws on Roy 1951 and Sjaastad 1962. In forthcoming papers, Borjas has since extended his analysis through the 1990 census, showing a continuation of the trends from 1970 to 1980. Borjas 1993, 1994.

62 Borjas 1987, Table 3.

63 Sowell 1981, p. 220.

64 Borjas 1987, Table 3.

65 Borjas 1987, p. 552.

66 The procedure is limited to the NLSY’s cross-sectional sample (i.e., omitting the supplemental samples), so that sample weights are no longer an issue. Using random numbers, subjects with IQ scores above 97 had an equal chance of being discarded. Because different subsamples could yield different results, we created two separate samples with a mean of 97 and replicated all of the analyses. The data reported in the table on page 368 represent the average produced by the two replications, compared to the national mean as represented by unweighted calculations using the entire cross-sectional sample.

67 Cattell 1938, as reprinted in Cattell 1983.

68 Cattell 1983, pp. 167, 168.

69 Cattell 1983, pp. 167, 175.

70 Cattell 1983, pp. 167, 169.

71 The procedures parallel those used for the preceding analysis of a mean of 97.

72 In effect, our sample with a mean of 97 shows what happens when people with above-average IQs decrease their fertility, and our sample of 103 shows what happens when people with below-average IQs decrease theirs. When we changed the NLSY sample so that the mean fell to 97, we used a random variable to delete people with IQs above 97 until the average reached 97. This did not do much to get rid of people who had the problems; most of its effect was to diminish the supply of people without problems. When we changed the NLSY sample so that the mean rose to 103, we were randomly deleting people with IQs below 103. In the course of that random deletion, a significant number of people toward the bottom of the distribution—our Classes IV and V—were deleted. Suppose instead we had lowered the IQ to 97 by randomly duplicating subjects with IQs below 97. In that case, we would have been simulating what happens when people with below-average IQs increase their fertility, and the results would have been more closely symmetrical with the effects shown for the 103 sample.

73 These figures continue to be based on the cross-sectional NLSY sample, used throughout this exercise. The 1989 poverty rate for the entire NLSY sample, calculated using sample weights, was 10.9 percent.

Chapter 16

1 A woman was classified as a chronic welfare recipient if she had received welfare for at least five years by the 1990 interview. Women with incomplete data on AFDC in the years following the birth of the first child or whose first child was born after 1985 were not scored on this variable.

2 We do not weight the computations for the overrepresentation of below-average IQ mothers, but we continue to use sample weights.

3 This represents the mean of the mothers of the NLSY children, with each mother counted once for each illegitimate child. Because of the inverse relationship between IQ and the number of illegitimate children, the mean counting each mother of an illegitimate child only once was higher: 89.

4 As in the case of illegitimacy, IQ and the number of children of divorced and separated mothers were inversely related. When the mother is counted only once regardless of the number of children, the mean is 94.

5 See Chapter 10 for a description of this intelligence test: (the PPVT).

Chapter 17

1 A brief refresher (see Chapter 4) : A heritability of 60 percent (a mid-range estimate) says that 40 percent of the observed variation in intelligence would disappear if a magic wand wiped out the differences in those aspects of the environment that bear on intelligence. Given that variance is the standard deviation squared and that the standard deviation of IQ is 15, this means that 40 percent of 152 is due to environmental variation, which is to say that the variance would drop from 225 to 135 and the standard deviation would contract to 11.6 instead of 15 if all the environmental sources of variation disappeared.

2 “A healthy mind in a healthy body.” Some of the history is recounted in Lynn 1990b. Abstracts of a series of studies by Stephen Schoenthaler and his associates on the effects of diet on intelligence and on antisocial, criminal behavior are in Schoenthaler 1991.

3 Stein et al. 1972.

4 Lynn 1990b.

5 Benton and Roberts 1988.

6 At the age of 12 and 13, youngsters’ scores rise during an eight-month period in the natural course of events. The dietary supplement, then, is affecting the rate of increase of the nonverbal, but not the verbal, scores.

7 Schoenthaler et al. 1991.

8 WISC-R. Block Design, a highly g-loaded subtest of WISC-R, showed little or no benefit of the food supplement.

9 Earlier work suggesting that reductions in refined sugar increase intelligence are now being reinterpreted as the effect not of sugar per se but of shifting the diet away from foods with little in the way of vitamins and minerals to more nutritious foods; see Schoenthaler et al. 1991; Schoenthaler Doraz, and Wakefield 1986. The basic point is that we have almost no idea of the pathway between diet or food supplements and intellectual development; assuming there is a path, it could be long and winding.

10 A child taking a pill that gives, say, one RDA is getting more than the recommended daily allowances, since the rest of his diet cannot be utterly devoid of vitamins and minerals.

11 For a failure to confirm an effect of vitamin-mineral supplements, see Crombie et al. 1990, and for a failure to find an effect on intelligence of diet short of chronic malnutrition, see Church and Katigbak 1991. For more general discussion of the issue, see Eysenck 1991; Lynn 1990; Yudkin 1991.

12 Later children are on the average born into larger families, which tend to be of lower average IQ. Hence, there is a decline with successive births that is a by-product of family size in and of itself. However, even after the family size effect is extracted, there may be a decline with birth order. The classic demonstration of declining scores with successive births independent of family size is a study based on a large sample of Dutch men (Belmont and Marolla 1973; Belmont, Stein, and Zybert 1978). Since then, subsequent studies have both confirmed and failed to confirm the basic relationship (e.g., Blake 1989; Retherford and Sewell 1991; Zajonc 1976). At present, there is no resolution of the varying findings.

13 Representative findings, on Japanese twins, are in Takuma 1966, described in Iwawaki and Vernon 1988.

14 For a review of the literature on twin differences in birth weight in relation to IQ as well as of other evidence that the uterine environment affects intelligence, see Storfer 1990.

15 Achenbach et al. 1990. This study compared two dozen low-birth-weight babies whose mothers received training in mothering with comparably small groups of normal-weight babies and low-birth-weight babies whose mothers did not receive the training. The encouraging outcome is that when the children were 7 years old, the usual deficit seems to have been forestalled by having trained the mothers in infant nurturing. However, the small scale of the study, the lack of random assignment to the three groups, and the puzzling near identity in scores for the underweight children whose mothers had been trained and the normal children suggest that the next step should to attempt to replicate the finding, as the authors themselves say.

16 For a helpful and balanced introduction to aptitude-treatment interactions, see Snow 1982.

17 Hativa 1988.

18 Atkinson 1974.

19 Cook et al. 1975.

20 Coleman et al. 1966. The report talked about educational “aptitude,” but the measures used—vocabulary scores, reading comprehension, mathematical reasoning tasks, etc.—were taken from standard group tests of IQ.

21 See Mosteller and Moynihan 1972 for a collection of more or less critical articles; included also is Coleman’s response to the most intense methodological criticisms (Coleman 1972). The combatants were often trying to answer different questions, with Coleman mostly interested in whether the objective differences among schools were responsible for the observed differences in abilities and his critics more interested in characterizing the objective differences in the schools. We cannot do justice to the range of issues that surfaced in the report and the subsequent commentary, but one of them deserves mention: The report uncovered evidence that the ethnic and socioeconomic mix of students in a school had a larger impact than the more standard investments in per pupil expenditures, teacher salaries, quality of physical plant, and the like. This, in turn, became a major argument for school busing. Soon after, school busing itself became a battleground for social researchers, a tale we will not tell here except to say that having a beneficial effect on intelligence is no longer used as an argument in favor of busing.

22 Coleman and Hoffer 1987.

23 It isn’t hard to find what seems to be the opposite conclusion in educational writings (e.g., the Coleman report is “no longer taken seriously,” Zigler and Muenchow 1992, p. 62) but no one has been able to show that the variables examined in the report account for much of the variation in cognitive ability among American public school students. If they are in any sense not taken seriously, it is presumably because educational variables other than the ones that Coleman studied have been found to be significant. This chapter reviews the evidence about those other variables as well.

24 See Kozol 1992 for a passionate argument that disparities in school funding are a major cause of disparities in educational outcomes.

25 Husén and Tuijnman 1991.

26 The quantitative details of the study are not germane to contemporary times, but even then, when schooling varied so broadly, the direct link between IQ at the age of 10 and at 20 was a minimum of five times stronger than that between amount of schooling and IQ at 20, in terms of variance accounted for in a path analysis.

27 Flynn himself does not believe that educational equalization per se accounts for much of the rise in IQ in some countries such as Holland (Flynn 1987a), but then Flynn also does not believe that the rising national averages in IQ really reflect rising intelligence.

28 Stephen Ceci (1991) has summarized evidence, much of it from earlier in the century, for an impact of schooling on intelligence.

29 National Center for Education Statistics 1981, Table 161, 1992, Table 347.

30 McLaughlin 1977, p. 55.

31 McLaughlin 1977, p. 53 The failure of such compensatory efforts antedated the Great Society by many years, however. An early educational researcher writing of similar compensatory efforts in 1938 concluded that “whatever the number of years over which growth was studied; whatever the number of cases in the several groups used for comparisons; whatever the grade groups in which the IQs were obtained; whatever the length of the interval between initial and final testing; in short, whatever the comparison, no significant change in IQs has been found” (Lamson 1938, p. 70).

32 Office of Policy and Planning 1993.

33 For more on this distinction, see Adams 1989; Brown and Campione 1982; Jensen 1993a; Nickerson, Perkins, and Smith 1985.

34 “Chicago educator pushes common sense,” St. Louis Post Dispatch, Dec. 2, 1990, p. 5D; “Marva Collins still expects, gets much,” St. Petersburg Times, July 23, 1989, p. 6A; “Pioneering educator does not want post in a Clinton cabinet,” Minneapolis Star Tribune, Oct. 25, 1992, p. 22A.

35 Spitz 1986. See also “Chicago schools get an education in muckraking,” Chicago Tribune, May 8, 1989, p. 1C.

36 “Fairfax principal, 4 other educators disciplined in test-coaching,” Washington Post, Aug. 7, 1987, p. C1.

37 “Pressure for high scores blamed in test cheating,” Los Angeles Times, Sept. 18, 1988, p. 1.

38 “S.I. principal said to fudge school scores,” New York Times, July 19, 1991, p. B1.

39 For a sense of the magnitude of the cheating problem, see “Schools for Scandal,” U.S. News & World Report, April 27, 1992, p. 66.

40 The minister was Luis Alberto Machado, a high official in the ruling party at the time.

41 Based on estimates in the preceding years, the children in the two groups were chosen to be of comparable cognitive ability. For descriptions of the experiment, see Herrnstein et al. 1986; Nickerson 1986.

42 The teachers’ manual for most of the lessons, translated into English, is available as Adams 1986.

43 See Brigham 1932 for the relevant background. Briefly, the SAT was originally designed to be an intelligence test targeted for the college-going population and was originally validated against existing intelligence tests. For a modern source showing how carefully the College Board avoids saying the SAT measures intelligence while presenting the evidence that it does, see Donlon 1984.

44 Fallows 1980; Slack and Porter 1980; Messick 1980; DerSimonian and Laird 1983; Dyer 1987; Becker 1990.

45 Messick and Jungeblut 1981.

46 From 1980 to 1992, the SAT-V standard deviation varied from 109 to 112 and the SAT-M standard deviation varied from 117 to 123. For the calculations, we assumed SDs of 110 and 120, respectively.

47 McCall 1979.

48 McCall 1987.

49 Alexander Pope (in his Moral Essays) is the poet, and the entire couplet is “Tis education forms the common mind; / Just as the twig is bent the tree’s inclined.”

50 See Mastropieri 1987 for a review of the expert consensus on this point.

51 For a sympathetic rendition of the program and its history, see Zigler and Muenchow 1992. For a more critical account, see Spitz 1986. We try to keep our account as close to what these two have in common as we can.

52 “Project Rush-Rush” was what Head Start was called by those in Washington who thought that it was plunging ahead with more speed than deliberation (quoted in Caruso, Taylor, and Detterman 1982, p. 52).

53 Zigler and Muenchow 1992, reporting the conclusions of Leon Eisenberg and C. Keith Connors after the first summer program. Only slightly less grandiose were the claims of raising IQ scores “a point a month” that were often cited by enthusiasts.

54 Sargent Shriver, brother-in-law of the late president, John Kennedy, and former head of the Peace Corps.

55 The first comprehensive evaluation was the so-called Westinghouse study, which the Office of Economic Opportunity sponsored. Its conclusion was that there were few or no cognitive benefits of Head Start within three years after the child completed it (Cicarelli, Evans, and Schiller 1969). Soon there was a mini-industry picking over the Westinghouse study, in addition to the one picking over Head Start. The consensus is now clear: Cognitive gains vanish before the end of primary school, e.g., Haskins 1989; McKey 1985; Spitz 1986; Zigler and Muenchow 1992. The new consensus has recently surfaced in the popular media (e.g., J. DeParle, “Sharp criticism for Head Start, even by friends,” New York Times, Mar. 19, 1993, p. A1).

56 For a range of views, see Gamble and Zigler 1989; McKey 1985; Zigler and Muenchow 1992.

57 E.g. Haskins 1989.

58 Zigler and Muenchow 1992. Edward Zigler, one of the early research directors of Head Start and a professor at Yale, argues in his book that it was a mistake from the beginning to promise gains in intelligence to the public. The more general shift away from making increases in IQ the target of preschool programs is discussed in Garber and Hodge 1991; Locurto 1991; Schweinhart and Weikart 1991, pro and con.

59 Among the people promising gains in the 300 percent range is the president of the United States, as reported by Jason DeParle (“Sharp criticism for Head Start, even by friends,” New York Times, Mar. 19, 1993). Even more of an optimist is economist Alan Blinder, who once promised a return of $4.75 for every dollar spent on preschool education (Blinder 1987).

60 For a review of such benefits from Head Start programs, see Haskins 1989, who concludes that the results “call for humility” (p. 280). The Head Start literature, he says, “will not support the claim that a program of national scope would yield lasting impacts on children’s school performance nor substantial returns on the investment of public dollars” (p. 280). In short, there are no sleeper effects from Head Start. Even the evidence of cost-effective returns in the more intensive educational programs is highly restricted. For a literature review, see Barnett and Escobar 1987.

61 Most of the children were 3 years old and spent two years in the program; the 22 percent who were 4 spent only one year in it (Barnett 1985; Berrueta-Clement et al. 1984.

62 Half a school day, or about two and a half hours.

63 The lack of effect was indirectly confirmed in a subsequent study by the same group of workers. They failed to find any differential effect on IQ of three different forms of preschool: their own cognitive enrichment program, a language-enhancing program, and a conventional nursery school program (Weikart et al. 1978). There was no control group in this follow-up, so we cannot say how much, if at all, preschool per se influenced IQ.

64 For a critical reading of just how minimal these other effects of preschool may have been, see Spitz 1986.

65 Lazar and Darlington 1982.

66 Similar estimates can be found in a study of the early effects of Head Start and the consortium sample (Lee et al. 1990).

67 Lazar and Darlington 1982, p. 47 The people who do these studies often argue that other positive effects are not being picked up in the formal measurements (e.g., Ramey, MacPhee, and Yeates 1982).

68 Many publications have flowed from the project; useful summaries are in Ramey 1992; Ramey, MacPhee, and Yeates 1982.

69 Personal communication from Ron Haskins.

70 Ramey 1992.

71 These differences are clearer in the critical accounts of the project in Spitz 1986 and 1992 than in the report by Ramey, MacPhee, and Yeates 1982.

72 Herrnstein 1982; Sommer and Sommer 1983.

73 Page 1972; Page and Grandon 1981.

74 Garber 1988; Garber and Hodge 1991.

75 Jensen 1989; Locurto 1991. The problem of “teaching to the test” recurs in educational interventions. It is based on the test’s being less than a perfect measure of intelligence (or g), so that it is possible to change the score without changing the underlying trait (see further discussion in Jensen 1993a).

76 Our topic here is the effect of adoption on raising IQ, not the implications of adoption data for estimating the heritability of IQ. For reviews of the adoption literature, see Herrnstein 1973; Locurto 1990; Munsinger 1975; Plomin and DeFries 1985. A comprehensive theoretical analysis of adoption studies of intelligence is in Turkheimer 1991.

77 Brown 1958, Chap. 5; Lane 1976; Lane and Pillard 1978.

78 Among others inspired by this evidence from “wild children” of the power over the mind of the human environment was an Italian physician trained at the end of the nineteenth century whose approach to education has survived the twentieth, Maria Montessori.

79 Locurto 1990; Plomin and DeFries 1985. In a refinement of this observation, it has been found that adopted children also score lower than the children in other homes that are socioeconomically the same as those of their adoptive parents but have no adopted children (thereby controlling for possible ways in which adoptive parents might be distinctive from non-adoptive parents).

80 Locurto 1990.

81 Dumaret and Stewart 1985; Schiffet al. 1982; Schiff and Lewontin 1986.

82 We will disregard in our analysis a number of considerations that would reduce estimates of the impact of home environment, such as that the IQ of the schoolmates of the nonadopted half-siblings (who presumably share comparable lower-class surroundings) averaged only seven points less than the adopted children, not twelve. This difference raises the possibility that the adopted-away child seemed brighter in infancy or had better intellectual prospects than the half-sibling who stayed at home because of the parent they did not share, or that the shift in home environments was even more extreme than the estimates below assume it was, as if the adopted child’s biological family home was atypically poor, even for the poor neighborhoods they were in. This, as we explain below, would reduce the over-all estimate of the impact of home environment.

83 The cell sizes in the 2 × 2 table of high-and low-SES adopting and biological parent families were only ten children or fewer.

84 Capron and Duyme 1989. This study showed an even larger benefit—equivalent to sixteen IQ points—of having high-SES biological parents, even when the child was not reared by them, which again points to a heritability greater than .5.

85 This, it should be remembered, is for childhood IQ, which is more subject to the influence of home environment than adult IQ. Recent work has also indicated that how a parent treats a child (presumably also an adopted child) is in part determined by the child’s inherited characteristics. To that extent, speaking of home environment as if it were purely an environmental source of variation is incorrect (see Plomin and Bergeman 1991).

86 A twenty-point swing is easily reconciled with a heritability of .6 for IQ. Suppose the high-and low-SES homes in the French studies represent the 90th and 10th centile of environmental quality, as the text says. A twenty-point swing in IQ from the 2d to the 98th centile of environmental quality would then imply that the standard deviation of home environment effects on IQ is 4.69. Squared, this means a variance of 22 attributable to home environment. But as we noted in note 1, a heritability of .6 implies that there is a variance of 225-135, or 90, attributable to environmental sources. The French adoption studies, in short, are consistent with the conclusion that about a quarter of environmental variance is the variance across homes (if our guesses about the adopting and biological home environments are not way off). Three-quarters of the environmental influence on intelligence must be uncorrelated with the family SES, according to the present analysis. Note again that the balance tips toward environmental factors outside families as being the more relevant than those provided by families in affecting IQ, as mentioned in Chapter 4.

87 For a discussion of cost-benefit considerations, see Haskins, 1989.

Chapter 18

1 “Sharpen your pencil, and begin now,” Wall Street Journal, June 9, 1992, p. A16.

2 National Commission on Excellence in Education 1983, p. 5.

3 National Commission on Excellence in Education 1984, p. 58.

4 For an example of an alarmist view and a discussion of the various estimates, see Kozol 1985.

5 National Center for Education Statistics 1992, Table 12-4.

6 DES 1992, Table 95.

7 Ravitch and Finn 1987, p. 49.

8 Congressional Budget Office 1987, p. 16.

9 Congressional Budget Office 1987, p. 16.

10 Quoted in Kozol 1985, p. 9.

11 Four of the studies were conducted by the International Association for the Evaluation of Educational Achievement, known as the IEA. They were the First International Mathematics Study (FIMS), mid-1960s; the First International Science Study (FISS), 1966-1973; the Second International Mathematics Study (SIMS), 1981-1982; and the Second International Science Study (SISS), 1981-1982. The fifth study was initiated by the United States as a spin-off from NAEP. It was conducted in 1988 and is . known as the First International Assessment of Educational Progress (IAEP-I) (Medrich and Griffith 1992).

12 Medrich and Griffith 1992, Appendix B.

13 National Center for Education Statistics 1992, pp. 208-215.

14 The best single source for understanding complexities of international comparisons is the summary and synthesis produced by National Center for Educational Statistics (Medrich and Griffith 1992). Other basic sources in this literature are Walker 1976; McKnight et al. 1989; Keeves 1991. There are cultural factors too. In his vigorous defense of American education, Gerald Bracey tells of the scene in a Korean classroom during one such international test: “As each Korean student’s name was called to come to the testing area, that child stood and exited the classroom to loud applause. What a personal honor to be chosen to perform for the honor of the nation!” American children seldom react that way, Bracey observes (Bracey 1991, p. 113).

15 Bishop 1993b, National Center for Education Statistics 1992a, pp. 60-61.

16 In addition to Bishop 1989, reviewed below, see especially Carlson, Huelskamp, and Woodall 1993; Bracey 1991.

17 Bishop 1989.

18 The Flynn effect refers to gradually rising scores over time on cognitive ability tests, discussed in Chapter 13.

19 NAEP periodically tests representative samples of students at different age levels in mathematics, reading, science, and, more recently, in writing and in history and literature.

20 National Center for Education Statistics, 1991, Fig. 1. The tests were designed to have a mean of 250 and a standard deviation of 50 when taken across all three age groups. The exception to flat trend lines was science performance among 17-year-olds, which shows a fifteen-point decline from 1969 to 1990, somewhat more than .3 SD (we do not know the specific standard deviations for 17-year-olds on the science test; probably it is less than 50). Note also that science among 17-year-olds reflects disproportionately the performance of the above-average students who tend to take high school science—consistent with our broader theme that educational performance deteriorated primarily among the gifted.

21 Two large questions about the table on page 422 immediately present themselves. First, are the five studies accurate representations of the national samples that they purported to select, and are the five tests comparable with each other? The answer to the first half of the question is a qualified yes. The studies were not perfect, but all appear to have been well designed and executed. The qualification is that the data exclude youngsters who did not reach the junior year in high school. The answer to the second half of the question is cloudier, if only because sets of tests administered at different times to different samples always introduce incomparabilities with effects that cannot be assessed precisely. The prudent conclusion regarding the math scores is to discount the modest fall and rise from 1955 to 1983 and assume instead that math aptitude over that period was steady. Regarding the Verbal scores, it seems likely that they rose from 1955 to 1966 and dropped from sometime after 1966 to sometime between 1974 and 1983, with the magnitude and precise timing of those shifts still open to question. Before leaving the norm studies, we must add a proviso: the SAT scales got easier during 1963 to 1973 by about eight to thirteen points on the Verbal and perhaps ten to seventeen points on the Math. They seem to have been stable before and following this period (Modu and Stern 1975, 1977). The same person would, in other words, have earned a higher score on the later SATs than the earlier ones, owing purely to changes in the test scales themselves. Whether the PSAT, a much shorter test, experienced the same degree of drift is unknown, but it is a good idea to adjust mentally the 1974 and 1983 scores downward a bit, though this does not change the overall interpretation of the results.

22 Grades 10 and 11 show a similar pattern. Grade 12 remained slightly under its high (1965-1967) as of 1992, but it is likely that the deficit is explained by increases in the proportion of 17-year-olds retained in school. The possibility remains open, however, that education in the post-slump period improved more in the lower grades than in the higher ones.

23 Congressional Budget Office 1986.

24 Medrich and Griffith 1992.

25 The College Board added new method of reporting test scores in 1967 based on seniors instead of all tests administered, and continued to report the means for both types of samples through 1977. During the years when both scores were available, the trends were visually almost indistinguishable. In the year when we employed the new measure in the graph on page 425, 1970, the scores for the two methods were identical.

26 Based on the 1963 standard deviations, .49 and .32 SD reductions respectively.

27 For a technical statement of this argument, see Carlson, Huelskamp, and Woodall 1993.

28 Readers can follow the journey through the numbers in Murray and Herrnstein 1992.

29 It is possible that the SAT pool was not getting democratized in the usual socioeconomic sense but was nevertheless beginning to dig deeper into the cognitive distribution. Responses in the SAT student questionnaire indicate that somewhat more students from the bottom of the class were taking the test in 1992 than in 1976, but this effect was extremely small for whites. In 1980, 72.2 percent of whites reported that they were in the top two-fifths of their high school class, compared to 71.5 percent in 1992. We nonetheless explored the possibility that the pool had become cognitively democratized, by looking at the scores of students who reported that they were in the top tenth, the second tenth, and the second fifth of their classes. If their scores went up while those for the entire SAT sample went down, that would be suggestive evidence (if we make certain assumptions about the consistency with which students reported their true class rank) that the pool was drawing from a cognitively broader segment of the population. Using 1980 (the end of the decline) to 1992 as the period of comparison, the Verbal scores of whites who reported they were in the top tenth, 2d tenth, and 2d fifth went up by five, seven, and eight points respectively, while that of the entire white SAT pool remained flat. In Math, the scores of the top tenth, 2d tenth, and 2d fifth went up by nine, thirteen, and fourteen points, respectively, while that of the pool rose by nine points. At first glance, this would seem to be evidence for a strong effect of cognitive democratization. But then we looked at what happened to the scores of white students reporting that they were in the 3d, 4th, and lowest fifths of their classes. Their scores went up by much more: nine, eleven, and ten points, respectively, in the Verbal; seventeen, seventeen, and nine in the Math. We are aware of Simpson’s paradox, which shows how scores in each interval can go up when scores in the aggregated group go down, but in this case the explanation appears to lie in changes either in the way that students report their class rank, the meaning of class rank, or both. We give “cognitive democratization” credit for two points each in the Verbal and Math, but it is not certain that even that much is warranted.

30 For an argument that the test score decline does in fact represent falling intelligence, see Itzkoff 1993.

31 For a broader discussion of falling SAT scores in the high-scoring segment of the pool, see Singal 1991.

32 From 1967, scores were reported for all test takers; from 1972 through 1976, ETS reported scores for all test takers and for college-bound seniors. To estimate college-bound seniors for 1967-1972, we computed the ratio of college-bound seniors to total test takers for the overlapping years of 1972-1976. For Verbal, the mean ratio was .82, with a high of .88 and a low of .77. For Math, the mean ratio was .78, with a high of .85 and a low of .71. The mean ratios were applied to the data from 1967 to 1972 to obtain an estimate of the number of college-bound seniors.

33 ETS keeps careful watch on changes in item difficulty, which are called “scale drift.” It finds that scores of 650 and above were little affected by scale drift (Modu and Stern 1975; 1977).

34 The remaining possibility is that the increase in the SAT pool during the 1980s brought students into the pool who could score 700 but had not been taking the test before. This possibility is not subject to examination. It must be set against the evidence that extremely high proportions of the top students have been going to college since the early 1960s and that the best-of-the-best, represented by those who score more than 700 on the SAT, have been avidly seeking, and being sought by, elite colleges since the 1950s, which means that they have been taking the SAT. Note also that the proportion of SAT students who identify themselves as being in the top tenth of their high school class—where 700 scorers are almost certain to be—was virtually unchanged from 1981 to 1992. Finally, if highly talented new students were being drawn from some mysterious source, why did we see no improvement on the SAT-Verbal? It seems unlikely that the increase in the overall proportion of high school students taking the SAT can account for more than a small proportion, if any, of the remarkable improvement in Math scores among the most gifted during the 1980s.

35 Once again, the changes are not caused by changes in the ethnic composition of the pool (for example, by an influx of test takers who do not speak English as their native language). The trendline for whites since 1980 parallels that for the entire test population.

36 National Center for Education Statistics 1992, p. 57. We also examined the SAT achievement test results. They are harder to interpret than the SATs because the test is regularly rescaled as the population of students taking the test changes. For a description of the equating and rescaling procedures used for the achievement tests, see Donlon 1984, pp. 21-27. The effects of these rescalings, which are too complex to describe here, are substantial. For example the average student who took the Biology achievement test in 1976 had an SAT-Math score that was 71 points above the national mean; by 1992, that gap had increased to 126 points. The same phenomenon has occurred with most of the other achievement tests (Math II, the more advanced of the two math achievement tests, is an exception). Put roughly, the students who take them are increasingly unrepresentative of the college-bound seniors who take the SAT, let alone of the national population. We focused on the students scoring 700 or higher by again assuming that since the 1960s, a very high proportion of the nation’s students who could score higher than 700 on any given achievement test took the test. We examined trends on the English Composition, American History, Biology, and Math II tests from three perspectives: the students scoring above 700 as a proportion of (1) all students who took that achievement test; (2) all students who took the SAT; and (3) all 17-year-olds. Method 1 (as a proportion of students taking the achievement test) revealed flat trendlines—not surprisingly, given the nature of the rescaling. Methods 2 and 3 revealed similar patterns. With all the reservations appropriate to this way of examining what has happened, we find that the proportion scoring above 700 on English Composition and Math II mirrored the contrast we showed for Verbal and Math scores on the SAT: a sharp drop in the English Composition in the 1970s, with no recovery in the 1980s; an equally sharp and steep rise in the Math II scores beginning in the 1980s and continuing through the 1992 test. The results for American History and Biology were much flatter. Method 2 showed no consistent trend up or down, and only minor movement in either direction at any time. Method 3 showed similar shallow bowl-shaped curves: reductions during the 1970s, recovery during the 1980s that brought the American History results close to the first year of 1972, and brought Biology to a new high, although one that was only fractionally higher than the 1972 results. This is consistent with a broad theme that the sciences and math improved more in the 1980s than the humanities and social sciences did.

37 Diane Ravitch’s account, one of the first, is still the best (Ravitch 1983), with Finn 1991; Sowell 1992; Ravitch 1985; Boyer 1983; and Porter 1990 providing perspectives on different pieces of the puzzle and guidance to the voluminous literature in magazines and journals regarding the educational changes in elementary and secondary schools. For basic texts by advocates of the reforms, see Goodman 1962; Kohl 1967; Silberman 1970; Kozol 1967; Featherstone 1971; Illich 1970; and the one that in some respects started it all, Neill 1960.

38 Fiske 1984; Gionfriddo 1985.

39 Sowell 1992, p. 7.

40 Bishop 1993b.

41 Bejar and Blew 1981; Breland 1976; Etzioni 1975; Walsh 1979.

42 By the early 1980s, when the worst of the educational crisis had already passed, the High School and Beyond survey found that students averaged only three and a half hours per week on homework (Bishop 1993b).

43 DES 1992b, Table 132.

44 DES 1992b, Table 129. The picture is not unambiguous, however. Measured in “Carnegie units,” representing one credit for the completion of a one-hour, one-year course, high school graduates were still getting a smaller proportion of their education from academic units than from vocational or “personal” units (National Center for Education Statistics 1992, p. 69).

45 We do not exempt colleges altogether, but there are far more exceptions to the corruption as we mean it at the university level than at the high school level, in large part because high schools are so much more shaped by a few standardized textbooks.

46 Gionfriddo 1985.

47 Irwin 1992, Table 1. The programs we designated as for the disadvantaged were the Title I basic and concentration grants, Even Start, the programs for migratory children, handicapped children, neglected and delinquent children, the rural technical assistance centers, the state block grants, inexpensive book distribution, the Ellender fellowships, emergency immigrant education, the Title V (drug and alcohol abuse) state grants, national programs, and emergency grants, Title VI (dropout), and bilingual program grants.

48 DES 1992b, Table 347.

49 Calvin Lockridge, quoted in “Old debate haunts Banneker’s future,” Washington Post, March 29, 1993, p. A10.

50 Ibid.

51 Bishop 1993b.

52 For a coherent and attractive list of such reforms, see Bishop 1990b.

53 Stevenson et al. 1990.

54 E.g., 63 percent of respondents in a recent poll conducted by Mellman-Lazarus-Lake for the American Association of School Administrators thought that the nation’s schools needed “major reform,” compared to only 33 percent who thought their neighborhood schools needed major reform. Roper Organization 1993.

55 E.g., Powell, Farrar, and Cohen 1985.

56 Bishop has developed these arguments in several studies: Bishop 1988b, 1990a, 1990b, 1993a, 1993b.

57 Bishop 1993b (p. 20) cites the example of Nationwide Insurance, which in the single year of 1982 sent out over 1,200 requests for high school transcripts and got 93 responses.

58 Bishop 1988a, 1988b, 1990a, 1993a, 1993b.

59 Bishop 1990b.

60 Ibid.

61 The Wonderlic Personnel Test fits this description. For a description, see E. F. Wonderlic & Associates 1983. The value of a high school transcript applies mainly to recent high school graduates who have never held a job, so that employers can get a sense of whether this person is likely to come to work every day, on time. But after the first job, it is the job reference that will count, not what the student did in high school.

62 The purposes of such a program are primarily to put the federal government four-square on the side of academic excellence. It would not appreciably increase the number of high-scoring students going to college. Almost all of them already go. But one positive side effect would be to ease the financial burden on many middle-class and lower-middle-class parents who are too rich to qualify for most scholarships and too poor to send their children to private colleges.

Chapter 19

1 Quotas as such were ruled illegal by the Supreme Court in the famous Bakke case.

2 Except as otherwise noted, our account is taken from Maguire, 1992.

3 A. Pierce et al., “Degrees of success,” Washington Post, May 8, 1991, p. A31.

4 Seven COFHE schools provided data on applicants and admitted students, but not on matriculated students. Those schools were Barnard, Bryn Mawr, Carleton, Mount Holyoke, Pomona, and Smith. The ethnic differences in scores of admitted students for these schools were in the same range as the differences for the schools shown in the figure on page 452. Yale did not supply any data by ethnicity. Data are taken from Consortium on Financing Higher Education 1992, Appendix D.

5 “Best Colleges,” U.S. News & World Report, Oct. 4, 1993, pp. 107-27.

6 Data for the University of Virginia and University of California at Berkeley are for 1988 and were obtained from Sarich 1990 and L. Feinberg, “Black freshman enrollment rises 46% at U-Va,” Washington Post, December 26, 1988, p. C1.

7 The figures for standard deviations and percentiles are based on the COFHE schools, omitting Virginia and Berkeley. The COFHE Redbook provides the SAT scores for the mean, 25th percentile, and 75th percentile by school. We computed the estimated standard deviation for the combined SATs as follows:

Estimated standard deviation for each test (Verbal and Math): given the scores for the mean and any percentile, the corresponding SD is given by (x−m)/z, where x is the score for the percentile, m is the mean, and z is the standardized score for that percentile in a normal distribution. Two separate estimates were computed for each school, based on the 25th and 75th percentiles. These two estimates were averaged to reach the estimated standard deviation for each test.

The formula for estimating the standard deviation of combined tests is Imag, where r is the correlation between the two tests and represents the standard deviation of the two tests. The correlation of the verbal and math SATs as administered to the entire SAT population is .67 (Donlon 1984, p. 55). The correlation for elite schools is much smaller. For purposes of this exercise, we err on the conservative side by continuing to use the correlation of .67. We further err on the safe side by using the standard deviation for the entire student population, which is inflated by the very affirmative action admissions that we are analyzing. If instead we were to use the more appropriate baseline measure, the standard deviation for the white students, the Harvard standard deviation (known from unpublished data provided by the Admissions Office) would be 105 instead of 122. For both reasons, the analysis of the gap between minority and white students in the COFHE data is understated. To give an idea of the magnitude, our procedure underestimated the known black-white gap at Harvard by 14 percent.

8 The Berkeley figure for Latinos is an unweighted average of Chicanos and other Latino means.

9 Scholars who have tried to do work in this area have had a tough time obtaining data, up to and including researchers from the Office for Civil Rights in the Department of Education (Chun and Zalokar 1992, note, p. 108).

10 The Berkeley figure for Latinos is an unweighted average of Chicanos and other Latino means. For Davis, only a Chicano category is broken out. Virginia had no figure for Latino students.

11 Chun and Zalokar 1992.

12 Committee on Minority Affairs 1984, p. 2.

13 Chan and Wang 1991; Hsia 1988; Li 1988; Takagi 1990; Bunzel and Au 1987.

14 K. Gewertz, “Acceptance rate increases to 76% for class of 1996,” Harvard University Gazette, May 15, 1992, p. 1.

15 F. Butterfield, “Colleges luring black students with incentives,” New York Times, Feb. 28, 1993, p. 1

16 For Chicano and other Latino students at Berkeley, the comparative position with whites also got worse. SAT scores did not rise significantly for Latino students during the 1978-1988 period, and the net gap increased from 165 to 254 points for the Chicanos and from 117 points to 214 points for other Latinos.

17 Powers 1977, as reported with supplementary analysis in Klitgaard 1985, Table A1.6, p. 205.

18 The 12-15 range cuts off the upper 11.5 percent, 14.9 percent, and 7.5 percent of matriculants with known MCAT scores for the biological sciences, physical sciences, and verbal reasoning tests respectively. By way of comparison, the top 10 percent in the SAT-Math in 1993 was a little above 650; in the SAT-Verbal, in the high 500s.

19 Shea and Fullilove 1985, Table 4, reporting 1979 and 1983 data, indicate that blacks with MCAT scores in the 5-7 range had approximately twice the chance of admission of white students. In another glimpse, a multivariate analysis of applicants to medical school from among the undergraduates at two University of California campuses (Berkeley and Davis) during the last half of the 1970s began with the average white male applicant, who had a 17.8 percent chance of being admitted. Holding other characteristics constant, being black raised the probability of admission to 94.6 percent. Being an American Indian or Chicano raised the probability to 95.0 percent (Olmstead and Sheffrin, 1980a). An Asian with identical age and academic credentials had a 25 percent chance of admission, higher than the white probability but not statistically significantly so. Williams, Cooper, and Lee 1979 present the odds from the opposite perspective: A study of ten medical schools by the Rand Corporation found that a minority student with a 50 percent chance of admission would have had about a 5 percent chance of admission if he were white with the same qualifications.

20 Klitgaard 1985.

21 Proponents of affirmative action commonly cite preference for children of the alumni and students from distant states as a justification for affirmative action. Given the size of the racial discrepancies we have reported, it would be useful to have an open comparison of the discrepancies associated with these other forms of preference. We have found data from only one school, Harvard, where the legacy of having a Harvard parent continues to be a plus in the admissions process but small in terms of test scores. For the decade starting in 1983, the average Verbal score of alumni children admitted to Harvard was 674 compared to 687 earned by the admitted children of nonalumni; for Math scores, the comparable scores were 695 versus 718, respectively. Office of Civil Rights 1990.

22 Higham 1984. The arguments against admitting Jews were likely to mention that gentile families might not send their children to a college with “too many” Jews (institutional self-interest) or that anti-Semitism would make it hard for Jewish alumni to use their college education for society’s welfare (social utility).

23 Berger 1987.

24 Lloyd 1990; Peller 1991.

25 The formal explication of this standard is Thorndike 1971. For a discussion of how slippery the notion of “acceptable” performance can be, see Brown 1980.

26 The comparisons are based on NLSY subjects who went to the same four-year colleges and universities (again, excluding historically black schools). Excluding junior colleges eliminates problems of interpretation if different proportions of different ethnic groups attended junior colleges rather than four-year institutions. Since the framework for the analysis assumes a multiracial campus, it seemed appropriate to exclude the 103 NLSY subjects (all but 6 of whom were black) who attended historically black institutions. For the record, the mean AFQT score of black students who first attended historically black institutions and blacks who first attended other four-year institutions were within two IQ points of each other.

27 We used the top and bottom half of socioeconomic status rather than a more restrictive definition (such as the top and bottom quartile) to give large enough sample sizes for us to have confidence in the results. When we used the more restrictive definitions, the results showed admissions decisions that were even farther out of line with the rationale, but with small samples numbering just 15 pairs for two of the cells. The procedure for the analysis was as follows: The NLSY includes the FICE (Federal Interagency Committee on Education) code for each institution the NLSY subjects attended. This analysis is based on the first such institution attended after high school. The matching procedure sometimes creates multiple lines for one member of the pair. For example, suppose that three whites and one black have attended the same school. One may either enter the black score three times or eliminate duplicates, entering the black score only once. We consider that the elimination of duplicates is likely to introduce more error, on the assumption that the differences among colleges can be large. Imagine a sample consisting of two schools: an unassuming state teachers’ college, with three whites and three blacks in the NLSY sample, and Yale, with three whites and one black. The Yale scores are much higher than the teachers college scores. Eliminating duplicates—entering just one (high) black score for Yale instead of the same score three times—would defeat the purpose of matching schools. The figures reported in the text are thus based on means that have counted some people more than once but control for institutional effects. The mean used to compute a cell entry is the intercept of a regression in which the dependent variable is IQ score and the independent variables are the institutions, coded as a vector of nominal variables. Note that we also reproduced this analysis eliminating duplicates. The results are so similar that the alternative numbers could be inserted in the text without requiring the change of any of the surrounding discussion.

In addition to this form of the analysis, we examined other ways of cutting off low and high socioeconomic status, ranging from the most general, which divided the deciles into the top and bottom five, to the most extreme, which considered only the top and bottom deciles. For the latter analyses, we used the entire sample of NLSY students who attended four-year institutions, to preserve large enough sample sizes to analyze. Those results were consistent with the ones presented in the text. A positive weight attached to being black until reaching the most extreme comparison, of a white student in the bottom socioeconomic status decile compared to a black student in the top decile, at which point the edge for the black student fell to close to zero (but never actually reached zero). We further examined the results when the sample consisted of NLSY subjects who had received a bachelor’s degree (not just attended a four-year college). The pattern was identical for both blacks and Latinos, and even the magnitudes of the differences were similar except that, as in other replications, the gap between the disadvantaged white and disadvantaged black grew substantially over the one reported in the text.

28 The computation, using IQ scores, was (black mean − white mean)/(SD of all whites who attended a four-year institution as their first college). In understanding the way that affirmative action operates, we take it that the reference point is the white student population, which indeed squares with most qualitative discussions of the issue, pro and con.

29 Perhaps “low SES” for blacks meant a much worse background than “low SES” for whites? Not by much; the means for both groups were close (31st percentile for whites, 25th for blacks), and controlling for the difference did not appreciably change the story. Nor did it do any good to try to define “high” and “low” SES more strictly, such as people in the top and bottom quartiles. In that case, the disadvantaged blacks were admitted with even lower lower scores than disadvantaged whites, in the region of 1.5 standard deviations (depending on the specific form of the analysis)—and so on through the cells in the table.

30 We use this indirect measure because other more direct measures (e.g., the number of blacks enrolling in college out of high school, or the number of persons ages 20 to 21 enrolled in school) do not go back to the 1960s and 1950s.

From 1950-1969, data are available only for “blacks and others.” Overlapping data indicate that the figure for “blacks only” in the early 1970s was stable at approximately 95 percent of the “blacks and other” figure. The data for 1950-69 represent the “blacks and other” numbers multiplied by .95. If one assumes that the proportion was somewhat higher in the 1950s and early 1960s, this produces a fractional overestimate of the upward black trendline, but so small as to be visually imperceptible in the graph on page 469.

31 Carter 1991; D’Souza 1991; Sowell 1989; Sowell 1992; Steele 1991.

32 See, for example, Sarich 1990; Lynch 1991.

33 For a review of this literature through the 1970s, see Breland 1979. Research since then has not changed the picture. See also Linn 1983; Donlon 1984, pp. 155-159.

34 As in so many matters involving affirmative action, this indirect reasoning would be unnecessary if colleges and universities were to open their data on grades to researchers.

35 Altbach and Lomotey 1991; Bunzel 1992; D’Souza 1991.

36 E.g., Carter 1991; Steele 1991.

37 National Center for Education Statistics 1992, Tables 170, 249. In the NLSY sample, among all students who first entered a four-year nonblack university, 27 percent of the whites failed to get a bachelor’s degree compared to 57 percent of the blacks and 55 percent of Latinos. “Dropout” in the NLSY is defined as having failed to have completed a bachelor’s degree by the 1990 interview, despite having once entered a four-year college. By that time, the youngest members of the NLSY were 25 years old.

38 The real discrepancy in dropout rates involved Latinos. Using the same analysis, the probability that a Latino student with an IQ of 110 would get a bachelor’s degree was only 49 percent. These results are produced when the analysis is run separately for each race.

39 A. Hu, “Hu’s on first,” Asian Week, May 12, 1989, p. 7; Consortium on Financing Higher Education 1992.

40 A. Hu, “Minorities need more support,” The Tech, Mar. 17, 1987, p. 1

41 Carter 1991; Sowell 1992; Steele 1991; D’Souza 1991; Murray 1984.

42 There should probably also be some contraints on the spread of the ability distributions in various groups, but such specificity would be out of place here.

Chapter 20

1 This statement assumes that the violation of the 80 percent rule is statistically significant. With sufficiently small numbers of hirees or promotions, these percentages will fluctuate widely by chance.

2 The Uniform Guidelines are just guidelines, not laws. In one notable 1982 case (Connecticut v. Teal), the Supreme Court ruled that even the practice of meeting the 80 percent rule by hiring larger numbers of test passers from the protected than from the unprotected groups still falls short if the test produces disparate impact. Disparate impact, in and of itself, said the Court in Teal, deprives protected applicants of equal opportunity, even if the disproportionate numbers are corrected at the bottom line. Under this ruling, an employer who hires a given number of blacks will be violating the law if the blacks have high ability test scores, but not violating the law if the same number of blacks are hired without recourse to the scores at all, and thus are bound to have lower scores on average. This eventuality was lauded by Kelman 1991, who argues (p. 1169) that hiring a larger proportion of test-passing blacks than test-failing blacks “stigmatizes” blacks because it implicitly validates a test on which blacks on average score below whites. Better, he suggests, not to test at all, tacitly assuming that the test has no predictive power worth considering. For another view of Teal, see Epstein 1992.

3 The Hartigan Report is discussed in Chapter 3.

4 E.g., Kelman 1991.

5 Heckman and Payner 1989, p. 138.

6 The categories are based on those defined by the federal government. The professional-technical category was chosen to represent high-status jobs. The clerical category was chosen both to represent lower-status skilled jobs and also because, among those categories (others are sales workers and the craft workers), clerical is the only category that shows a visibly steeper increase after 1959 than before it. Two technical points about the graph on page 485 are important. First, the job classification system used by the Census Bureau was altered in 1983. Figures for 1983-1990 conform to the classification system in use from 1959-1982. The professional-technical category for 1983-1990 consists of the sum of the headings of “professional specialty,” “technical, sales, and administrative support,” “accountants and auditors,” and “personnel, training, and labor relations specialists.” The clerical category consists of the sum of “administrative support, including clerical,” and “cashiers.” Second, the data in the graph are for blacks only, corrected for the “blacks and others” enumeration that was used until 1973. The correction is based on the known ratio of jobs held by the “others” in “blacks and others” for overlapping data as of 1973. This assumes that the “others” (mostly Asian) held a constant proportion of clerical and professional jobs held by “blacks and others” from 1959-1973. If in fact the proportion went down (blacks acquired these jobs disproportionately), then the pre-1973 line in the graph slightly underestimates the slope of the black increase. If in fact the proportion went up (the “others” acquired these jobs disproportionately), then the pre-1973 line in the graph slightly overestimates the slope of the black increase. Note, however, that even as of 1973, blacks constituted 87.9 percent of the “black and other” population ages 18 and over, compared to 91.9 percent in 1960, so the degree of error is unlikely to be visually perceptible in the graph. The alternative was to show “blacks and others” consistently from 1959 into the 1990s, but from a technical perspective this becomes increasingly inaccurate as the percentage of “others” increases rapidly in the 1970s and 1980. Visually, graphs prepared under either method show the same story.

7 The main complications are, first, that the affirmative action policies evolved over a period of time, so that the landmark events are not as decisive as they may appear to be (see Appendix 7). Second, laws and regulations often institutionalize changes that were already under way for other reasons. This seems to be clearly the case with the hiring of minorities, and it, too, tends to blunt the impact of the laws and regulations when they come along. Third, different regions of the country probably reacted to the laws and regulations differently, thereby diluting their impact in national statistics.

8 Donohue and Heckman 1991 ; Epstein 1990; Freeman 1984; Heckman and Payner 1989; Heckman and Verkerke 1990; Leonard 1986; Welch 1981.

9 Brown and Erie, 1981 concluded that about 55 percent of the increase in black managerial, professional, and technical employment from 1960 to 1976 occurred in the public sector.

10 The classic exchange on this topic is Epstein 1992, Chap. 12; Heckman and Payner 1989.

11 The normative 1 standard deviation difference is assumed for this exercise. The observed difference in the NLSY is larger, hence would only exacerbate the conclusion suggested by the graphic on page 485.

12 Obviously, there will be employees who fall outside the range. But insofar as the tails at both ends are small and roughly equivalent, the calculation is not much affected. These particular numbers are based on the observed distribution of NLSY whites in these job categories. For clerical jobs, 90 percent of all white employees had IQs between 85.7 and 122.7, with a standard deviation of 11.3. For professional and technical jobs, 90 percent of all white employees had IQs of 98.0 and above, with a standard deviation of 11.8.

13 The assumptions used for the figure are extremely conservative. Most obviously, the standard deviation of 15 is too high. People within an occupational category will always tend to have a smaller dispersion than the general population. If we change nothing except reduce the standard deviations to 12 for both blacks and whites, in line with the observed standard deviations in the NLSY, the black-white ratios rise from 1.7 (professional-technical) and 1.6 (clerical) to 2.5 and 1.9 respectively. In addition, however, the graph on page 490 is conservative in using an IQ range that encompasses 90 percent of the white workers in an occupational category. The lower the bottom end of the range is, the more it disproportionately inflates the eligible portion of the black population (changes in the top end of the range are at the tail of the distribution and add very little to the eligible pool). Visualize the bell curve: By lowering the bottom cutoff for professional-technical professions from 100 to 98 (for example), everyone in that very fat part of the curve is treated as being just as eligible for a professional-technical occupation as anyone else—even though, in reality, they are much less likely than persons with higher IQs to get such jobs. If, for example, we base the range on the IQs that embrace 80 percent of the white workers in an occupation—more realistic in many respects—the black-white ratio in 1990 grows to 2.3 for professional-technical occupations and 1.8 for clerical. But the conclusions still hold even if we broaden the range still further than in the graph, to embrace 95 percent of all people in those occupations. In that case—which assumes, implausibly, that all people with IQs higher than 89.8 are equally likely to be hired for technical-professional jobs and that all people with IQs between 82.0 and 130.3 are equally likely to be hired for clerical jobs—the black-white ratio as of 1990 is still greater than 1 in both instances: 1.2 for professional-technical, 1.5 for clerical. In short, the differences produced by altering the assumptions can make substantial differences in the size of the estimates of disproportionate hiring, but even assumptions that go well beyond common sense and the available data do not change the overall conclusions drawn in the text.

14 The observations using the CPS and the NLSY are not completely independent, insofar as we took our estimate of the IQ range for clerical and professional-technical occupations from the data on NLSY whites. But those parameters did not constrain the results for blacks.

15 The sample in these analyses excluded persons who were still in school in 1990.

16 Jaynes and Williams 1989, Tables 44, 6-1.

17 Hartigan and Wigdor 1989. See also Chapters 3 and 13.

18 As of 1987, states had such a certification process. See Rudner 1988.

19 Straus and Sawyer 1986.

20 Lerner 1991.

21 In Pennsylvania, with the highest pass rates, the state commissioner of higher education openly acknowledged that Pennsylvania sought to avoid lawsuits alleging racial bias in the test by establishing a low cutoff score that they would subsequently try to raise. See H. Collins, “Minority groups are still lagging on teacher’s exam,” Philadelphia Inquirer, Aug. 5, 1989, p. B1.

22 The answer to the question of how such large differences can show up in otherwise credentialed teachers is, in effect, the topic of the preceding chapter, on affirmative action in higher education.

23 If we make the empirically more likely assumption that IQ does have a positive correlation with the nonintellectual skills, then the people with low intellectual skills will, on average, also have depressed nonintellectual job skills.

24 For examples of affirmative action programs in public bureaucracies, see Lynch 1991, pp. 24-32; Taylor 1992, Chaps. 4, 5.

25 Carlson 1993.

26 Carlson 1993, p. 28.

27 Carlson 1993, p. 30.

28 Washington Post, October 24-28, 1993.

29 Delattre 1989; Sechrest and Burns 1992.

30 Among the other stories we have located linking poor worker performance to hiring under affirmative action requirements are one reporting an increase in collisions and other accidents on the New York public transportation system (K. Foran, “TA lax on Safety,” Newsday, Sept. 19, 1990, p. 5), another describing the rise in criminal behavior among Detroit’s police officers (E. Salholz, “Going After Detroit’s rogue cops,” Newsweek, Sept. 5, 1988, p. 37), and one discussing the much higher rate of firings among Boston’s black postal workers, compared to white workers (B. McAllister, “Researchers say Postal Service tried to block article on firings,” Washington Post, Oct. 17, 1992, p. A3).

31 Silberberg 1985. See also Ford et al. 1986; Kraiger and Ford 1985.

32 Silberberg has his own interesting hypotheses about these differences, which we do not elaborate here. Nothing in his account is at variance with our conclusion that affirmative action procedures are exacting a cost in worker performance.

33 Hacker 1992, p. 25.

34 In fact, that was precisely the excuse often given by the major leagues for not hiring blacks.

35 For a detailed statement of this perspective, see Kelman 1991.

36 Quoted in Bolick 1988, p. 49. See also Taylor 1992, p. 126.

37 There is a presumption that if we cannot explain a group difference, it is appropriate to assume that there is no good reason for it. This is bad logic. Not knowing a good reason for a difference is not the same as knowing that there is no good reason.

38 We understand the argument that, in the long term, and taking the broadest possible view, if all businesses were to behave in “socially responsible” ways, there would result a better society that would provide a healthy climate for the businesses themselves. Our argument is somewhat more direct: Can a university president, thinking realistically about the foreseeable future, see that his university will be better qua university by admitting some students who are academically less qualified than their competitors? Generally, yes. Can the owner of a business, thinking realistically about the foreseeable future, see that his business will be better qua business by hiring people who are less productive than their competitors? Generally, no.

39 D. Pitt, “Despite revisions, few blacks passed police sergeant test,” New York Times, January 13, 1989, p. 1.

40 See Taylor 1992, pp. 129-137, for an account of some of the more egregious examples.

41 The largest difference, 1.6 SDs, was for persons with advanced degrees. For Latinos, the gap with whites ranged from .6 to 1.0 SDs.

42 Other approaches for contending with affirmative action constraints have surfaced. For example, New York’s Sanitation Department used a test on which 23,078 applicants out of 24,000 got perfect scores, and its Fire Department used a test with multiple choice questions for which a point of credit was given if the first choice is correct, a half-point if the second choice is correct, or a quarter-point if the third choice is correct, thereby inflating the grades for people who get lots of items wrong (Taylor 1992).

43 Hartigan and Wigdor 1989; Hunter and Hunter 1984.

44 For an account, see Hartigan and Wigdor 1989.

45 E. F Wonderlic * Associates, 1983, Table 18, p. 25. The scores of Asians are lower than the national mean (in contrast to results of IQ studies) probably because the Wonderlic, a pencil-and-paper test, is language sensitive and is widely used for lower-level jobs. It seems likely that substantial proportions of Asians who take the Wonderlic are recent immigrants for whom English is a second and often newly acquired language.

46 Summarized in Lynch 1991. See also Detlefsen 1991.

Chapter 21

1 Kaus 1992. Kaus’s analysis runs parallel with our own in many respects—among other things, in his use of the Herrnstein syllogism (Herrnstein 1971, 1973) to think about the stratifying influence of intelligence.

2 The remark appeared in the manuscript of The End of Equality. It is used here with permission of the author.

3 Quoted in Novak 1992, p. 24.

4 Surveys by the Roper Organization (Roper Reports 92-5), as reported in American Enterprise (May-June 1993): 86.

5 U.S. Bureau of the Census 1992, Table B-6, 1975.

6 U.S. Bureau of the Census, 1991, Table B-3. All data are based on pretax income, so the tax reforms of the 1980s are not implicated.

7 Reich 1991.

8 Voting estimated from Jennings 1991, Tables 7, 10, 13.

9 Overall, 19.2 percent of children born to NLSY women from the mid-1970s through 1990 were born to unmarried mothers with below-average IQs. The national illegitimacy ratio grew steadily throughout that period.

10 “White” includes births to Caucasian Latinos. The National Center for Health Statistics has provided Latino/non-Latino breakdowns only since 1986. During that period, the non-Latino white illegitimacy ratio increased from 13.2 percent to 18.0 percent in 1991, the latest figures as we write.

11 Data refer to poverty in the year prior to birth, and to non-Latino and Latino whites combined, to be consistent with the use of “white” in this discussion. The proportions for non-Latino white women above and below the poverty line were quite similar, however: 6 percent and 44 percent respectively.

12 Unpublished detailed tables for Bachu 1993, available from the Bureau of the Census.

13 These continue to be figures for Latino and non-Latino whites combined. The figures for non-Latino whites may be found in Chapter 8. They are not so different (because non-Latino whites so dominate the total). Seventy-two percent of illegitimate children of non-Latino white mothers in the NLSY had IQs below 100, and 39 percent had IQs below 90.

14 Wilson 1987. For a complementary view, see Massey and Denton 1993.

15 In the NLSY, blacks from the lowest quartile of socioeconomic background had a mean IQ equivalent of 82.

16 For an early statement of this argument, see Murray 1988a.

17 Jencks and Peterson 1991.

18 Chapter 16 discussed some of these efforts with regard to intelligence. For broader-ranging assessments, see Murray 1984; Stromsdorfer 1987; Rossi 1987; Glazer 1988.

Chapter 22

1 The phrasing draws from Rawls 1971, pp. 14-15.

2 For discussion of this transformation, see, for example, Brown 1988.

3 Thomas Hobbes postulated an axiom—Hobbes saw it as literally an axiom, in the mathematical sense—for governing people with equal rights to liberty: “That a man be willing, when others are so too … to lay down this right to all things; and be contented with so much liberty against other men, as he would allow other men against himself.” Hobbes 1651, Chap. 14.

4 Hobbes expressed the gloomy prospect of perfect anarchy in the one sentence for which he is best remembered: “And the life of man [would be] solitary, poore, nasty, brutish and short.” Hobbes 1651, Chap. 13.

5 Locke 1689, Second Treatise, sec. 4.

6 Locke 1689 Bk. IV, Chap. XX.

7 See, for example, Wills 1978; Beer 1993.

8 Mayo 1942, pp. 77-78.

9 Costopoulos 1990, p. 50.

10 Costopoulos 1990, p. 47.

11 Mayo 1942, p. 78.

12 Costopoulos 1990, p. 47.

13 Quoted in Diamond 1976, p. 16.

14 Costopoulos 1990, p. 48.

15 That fact, combined with the “irresistible corruption” that Adams saw as infecting all political systems, caused him to be deeply pessimistic about the survival of the experiment in human government that he had been so instrumental in founding. He sometimes wondered gloomily whether a hereditary aristocracy on the British model might be necessary to offset the unrestrained avarice and factiousness of Jefferson’s natural aristocracy.

16 Aristotle 1905 ed., p. 207.

17 Hamilton et al. 1787, No. 10.

18 White 1958, p. 122.

19 Huber 1988; Olson 1991.

20 Bureau of Labor Statistics 1982, Table C-23, 1989, Table 42.

21 In 1990 dollars in all cases: the annual income of male year-round, full-time nonfarm, non-mine laborers was $16,843 in 1958. (SAUS 1970, Table 347). The comparable earnings for “handlers, equipment cleaners, helpers, and laborers” in 1991 was $16,777. U.S. Bureau of the Census, 1992, Table 32. The full-time weekly earnings of “lower-skilled labor” in 1920 was $169 in 1990 dollars, or $8,459 for a fifty-week year (U.S. Bureau of the Census 1975, Series D 765-778).

22 For a full presentation of the following argument, see Murray 1988b, Chap. 12.

23 Wilson 1993.

24 It is doubtless harder even for bright people to lead law-abiding lives when the laws become more complex, but the marginal effects will be smaller on them than on the less bright.

25 Ellwood 1988.

26 For an accessible discussion of the pros and cons of the EITC, see Kosters 1993. A more ambitious approach that we think deserves consideration would replace the entire structure of federal transfers to individuals—income supplements, welfare, in-kind benefits, farm subsidies, and even social security—with a negative income tax of the kind proposed by Milton Friedman in Friedman 1962. Like Friedman, we are attracted to this strategy only if it replaces everything else, a possibility so unlikely that it is hard to talk about seriously. This does not diminish its potential merit.

Short citations refer to works that are already cited in full in the bibliography.

1 M. W. Brown, What is intelligence, and who has it? New York Times Book Review, October 16, 1994, pp. 3-6.

2 Murray 1984, 1988b.

3 Herrnstein 1973.

4 M. Novak, Sins of the cognitive elite, National Review, December 5, 1994, pp. 58-61. T. Sowell, Can we find a way to discuss intelligence intelligently? Washington Times, October 21, 1994.

5 Ibid., p. 59.

6 Gould 1981; Gardner 1983.

7 Snyderman and Rothman 1988.

8 S. J. Gould, Curveball, New Yorker, November 28, 1994, pp. 143-144.

9 Steve Blinkhorn, quoted in B. D. Davis, Neo-Lysenkoism, IQ, and the press, Public Interest, no. 73 (1983), 44. The Davis article is an illuminating review of the contrasting receptions of The Mismeasure of Man accorded by the press and by the scientific community.

10 J. B. Carroll, Human Cognitive Abilities: A Survey of Factor-Analytic Studies (Cambridge: Cambridge University Press, 1993).

11 For examples, see Jensen 1987 or B. Bower, Images of intellect: Brain scans may colorize intelligence, Science News, October 8, 1994, pp. 236-237.

12 Mainstream science on intelligence, Wall Street Journal, December 13, 1994.

13 For a recent and comprehensive presentation of Rushton’s argument and evidence, see J. P. Rushton, Race, Evolution, and Behavior (New Brunswick, N.J.: Transaction, 1994).

14 Ibid., chapter 6.

15 J. Rosen and C. Lane, Neo-Nazis! New Republic, October 31, 1994, pp. 14-15; C. Lane, The tainted sources of “The Bell Curve,” New York Review of Books, December 1, 1994.

16 L. J. Kamin, Behind the curve, Scientific American (February 1995 ) : 99-103.

17 K. Owen, The suitability of Raven’s Standard Progressive Matrices for various groups in South Africa, Personality and Individual Differences 13 ( 1992): 149-159.

18 F. Zindi, Differences in psychometric performance. Psychologist 7 (1994): 549-554.

19 Kamin 1995, p. 103.

20 J. J. Heckman, Cracked bell, Reason (March 1995): 53.

21 A. Goldberger, Journal of Economic Literature 33(1995): 762-776.

22 Kamin 1995, p. 102.

23 As I write, I have learned of just one computational error out of the hundreds of statistical results presented in the book, in the table on p. 591 (Appendix 3) in the hardcover edition, caused by the miscoding of nine cases. The numbers have been corrected for this edition. The changes did not require any alteration in the wording of the discussion.

24 R. Nisbett, Race, IQ, and scientism, in S. Fraser (ed. ), The Bell Curve Wars: Race, Intelligence, and the Future of America (New York: Basic Books, 1995), p. 45.

25 Jeanne Brooks-Gunn et al., Early intervention in low-birth-weight pre-mature infants, JAMA 272 (1994): 1257-1262.

26 Howard Gardner, Cracking open the IQ box, American Prospect (Winter 1994): 71-80. Lisbeth Schorr and Daniel Schorr, Within Our Reach: Breaking the Cycle of Disadvantage (New York: Doubleday, 1988).

Appendix 1

1 The figure depicts 250 18-year-old males drawn randomly from the NLSY sample.

2 Based on the NLSY subjects, born from 1957 through 1964, as of 1982, when the youngest was 18 years old, the mean height of contemporary Americans is a little over 5 feet 7 inches, with a standard deviation of about 4 inches.

3 Based on the 1983 ETS norm study (Braun and King 1987) and dropout rates in the 1980s, we estimate the mean for all 18-year-olds (including dropouts) at 325, with an standard deviation of 105. This would indicate that the 99th centile begins at a score of 569. The example in the text is phrased conservatively.

4 The Pearson’s r is .501 in both cases. The number 3,068 refers to males with weight and height data in 1982.

5 For simplicity’s sake, we are assuming that the variables can have only linear relationships with each other.

Appendix 2

1 The NLSY on CD-ROM disk is available for a nominal fee from the Center for Human Resource Research, Ohio State University.

2 Inquiries should be directed to Prof. Richard J. Herrnstein, Department of Psychology, William James Hall, Harvard University, Cambridge, MA 02138, or to Dr. Charles Murray, American Enterprise Institute, 1150 17th St. NW, Washington, DC 20036.

3 Data for 1991 had become available in time to be used for the analysis, but for budgetary reasons, the NLSY had to cut the supplementary sample of low-income whites as of 1991. We decided that the advantages of including low-income whites in the analysis outweighed the advantages of an additional year of data.

4 We followed the armed forces’ convention of limiting subtest scores to a maximum of three standard deviations from the mean. We gratefully acknowledge the assistance of Dr. Malcolm J. Ree, who led the revision of the AFQT, in computing the revised scores for the NLSY.

5 This procedure is facilitated by the large sample sizes (at least 1,265 with valid AFQT scores in each birth year, which are as large as the samples commonly used for national norms in tests such as the WISC and WAIS), and the fact that the NLSY sample was balanced for ethnic group and gender within birth years.

6 We also experimented with groupings based not on the calendar year, but the school year. The differences in centile produced by the two procedures were never as much as two, so we remained with calendar year as the basis.

7 See Users Guide 1993, pp. 157-162.

Appendix 3

1 The subtests are General Science (GS), Arithmetic Reasoning (AR), Work Knowledge (WK), Paragraph Comprehension (PC), Numerical Operations (NO), Coding Speed (CS), Auto/Shop Information (AS), Mathematics Knowledge (MK), Mechanical Comprehension (MC), and Electronics Information (EL). Two subtests (Numerical Operations and Coding Speed) are highly speeded; the other eight are “power” rather than speed tests.

2 Ree and Earles 1990a, 1990b, 1991c.

3 We use the term factor in a generic sense. Within psychometries, terms like factor and component are used selectively, depending on the particular method of analysis used to extract the measures.

4 E.g., Gould 1981.

5 Jensen 1987a, 1987b; Ree and Earles 1991c; Welsh, Watson, and Ree 1990.

6 To account for literally 100 percent of the variance takes ten factors (because there are ten subtests), with the final few of them making increasingly negligible contributions. In the case of ASVAB, the final five factors collectively account for only 10 percent of the total variance in scores.

7 Sperl, Ree and Steuck 1990.

8 Carroll 1988; Jensen 1987a.

9 Ree and Earles, 1990a, 1990b, 1991c.

10 Gordon 1984; Jensen and Figueroa 1975.

11 Note that the General Science subtest and the Electronics Information subtest are as highly g-loaded as the subtests used in the AFQT. Why not use them as well? Because they draw on knowledge that is specific to certain courses that many youths might not have taken, whereas the mathematics and reading subtests require only material that is ordinarily covered in the courses taken by every student who goes to elementary and secondary school. But this is a good illustration of a phenomenon associated with IQ tests: People who acquire knowledge about electronics and science also tend to have high mathematics and verbal ability.

12 Jensen 1980, Table 6.10.

13 Within a single test, the test score might mean any of several percentile scores, depending on the age of the student; hence the reason for using percentiles. For the analyses in the text, scores were used only if both a test score and a percentile were recorded. Anomalous scores were discarded as follows: For the California Test of Mental Maturity, one test score of 700. For the Otis-Lennon Mental Ability Test, eight cases in which the test score was under 30 and the percentile was over 70; one case in which the test score was 176 and the percentile was only 84. For the Henmon-Nelson Test of Mental Maturity, one test score of 374. For the Differential Aptitude Test, sixteen test scores over 100. For the Lorge-Thorndike Intelligence Test and the Kuhlmann-Anderson Intelligence Test, which showed uninterpretable scatter plots of test scores against percentiles, cases were retained if the test score normed according to a mean of 100 and a standard deviation of 15 was within 10 centiles of the reported percentile score. The number of eligible scores on the Stanford-Binet and the Wechsler Intelligence Scale for Children (18 and 16, respectively) was too small to analyze.

14 Jensen 1980, Table 8.5.

15 This list is taken from Jensen 1980, p. 72. Jensen devotes a chapter (Chap. 4) to the distribution of mental ability, which we recommend as an excellent single source for readers who want to pursue this issue.

16 For an exploration of the relationships as of the late 1960s, see Jencks et al. 1972, Appendix B. For separate studies, see Rutter 1985; Hale, Raymond, and Gajar 1982; Wolfe 1982; Schiff and Lewontin, 1986.

17 Husén and Tuijnman, 1991. See also Ceci 1991, for a case that schooling has a greater influence on IQ than has generally been accepted, drawing heavily on data from earlier decades when the natural variation in schooling was large.

Appendix 5

1 Validity is measured by the correlation between predictor and outcome, which, multiplied by the ratio of the standard deviations of the outcome to the predictor, gives the regression coefficient of the outcome on the predictor. To keep this discussion simple, we assume an increasing monotonic relationship between the validity and the regression coefficient here. For a discussion that does not make this simplifying assumption, see Jensen 1980.

2 In the following sources, one can find varying estimates of the magnitude of predictive validity of intelligence tests and varying opinions about whether the tests are a net benefit to society, but they unanimously accept the conclusion that no bias against blacks in educational or occupational prediction has been found: Breland, 1979; Grouse and Trusheim 1988; Hartigan and Wigdor 1989; Hunter and Schmidt 1990; Jensen 1980; Klitgaard 1985; Reynolds and Brown 1984; Schmidt 1988.

3 For a discussion of the sources of error and their relevance to meta-analyses of occupational outcomes in particular, see Hunter and Schmidt 1990. For a more general discussion, including educational outcomes, see Jensen 1980.

4 Jensen 1984b, p. 523.

5 Occasionally, one may find a study that finds differential predictive validity for one ethnic group or another for a particular test—e.g., the K-ABC test for Latinos and non-Latino whites (Valencia and Rankin 1988). But even for Latinos, validity generalization has generally been confirmed (e.g., Reynolds and Gutkin 1980; Valdez and Valdez 1983).

6 Jensen 1980, Table 10.4.

7 Breland 1979, Table 3b.

8 Ibid.

9 Hartigan and Wigdor 1989, Table 9.5.

10 Ibid., pp. 181-182.

11 The example given here is a special case of a more general phenomenon: As long as the product of the regression coefficient (which is assumed not to differ for the groups) and the mean difference between groups in the predictor is smaller than the mean difference in the outcome, there will be overprediction for the lower-scoring group.

12 For a review of the literature through the early 1980s, see Jensen 1985, also discussed in Chapter 13. For studies since then, see Braden 1989; Jensen 1992, 1993b. The single contrary study extant is Gustafsson 1992.

13 McGurk 1951. Also in 1951, Kenneth Eells’s doctoral thesis at the University of Chicago showed that test item difficulty did not vary much across white ethnics of different types, thereby failing to support the intuition that cultural factors are dominant (Eells et al. 1951). See Jensen 1980, Chap. 11, for more on McGurk’s and Eells’s work and on other early studies of test item bias.

14 For a review of the literature through the late 1970s, see Jensen 1980, Chap. 11. For studies since 1980, see Bart et al. 1986; Ross-Reynolds and Reschly 1983; Sandoval et al. 1983; Jensen and McGurk 1987; Cook 1987; Koh, Abbatiello, and Mcloughlin 1984; Reschly and Ross-Reynolds 1982; Mishra 1983. All found no item differences, or differences that explained only a fraction of the differences in group scores. Are there any exceptions? We identified one such study for blacks (Montie and Fagan 1988), based on 3-year-olds. There may very well be other studies of similar size (the sample in Montie and Fagan was 86) that are lurking in the literature, but we know of no studies using large-scale representative samples that establish item bias against blacks. Some studies of Latinos have found evidence of bias, mostly associated with Spanish and English language characteristics. See Valencia and Rankin 1988; Whitworth and Chrisman 1987, Munford and Munoz 1980. But the factor structure of the test results has generally been found to be the same for Latino and non-Latinos (e.g., see Mishra 1981).

15 See Jensen 1980, Table 11.12. Also see Miele 1979.

16 Scheuneman 1987.

17 For a literature review, see Jensen 1980, Chap. 12.

18 Dyer 1970.

19 For studies specifically dealing with differential racial effects of coaching and practice through the late 1970s, see Baughman and Dahlstrom 1968; Costello 1970; Dubin, Osburn, and Winick 1969; Jensen 1980. For studies bearing on the issue since 1980 (but not addressing it as directly as the earlier ones), see Powers 1987; Terrell and Terrell 1983; Johnson and Wallace 1989; Cole 1987.

20 For literature reviews, see Sattler and Gwynne 1982; Jensen 1980.

21 For a literature review, see Jensen 1980, Chap. 12.

22 For a literature review, see Jensen 1980, Chap. 12.

23 Jensen 1980, Chap. 12. See also note 14 regarding item bias for Latinos.

24 Jensen 1980, Chap. 12.

25 Quay 1971, 1972, 1974.

26 Farrell 1983 and the attached responses.

27 Johnson et al. 1984; Frederiksen 1986; Johnson 1988; Kerr et al. 1986; Madhere 1989; Scheuneman 1987; White et al. 1988

28 Rock et al. 1985 details the changes between the two administrations, concluding that “the cautious position would be that neither administration had an advantage. A less cautious conclusion is that the 1980 subjects probably had some small advantage” (p. 18).

29 Based on the white standard deviation for 1980, the first year that standard deviations by race were published.

30 Congressional Budget Office, 1986, Fig. E-3.

31 Contrary to popular belief, on the proposition whether brain size is correlated with IQ, the evidence strongly favors the pros over the cons, even after correcting for stature. A sampling of contemporary positions in this mini-controversy is Cain and Vanderwolf 1990; Gould 1978, 1981; Lynn 1989; Michael 1988; Passingham 1982; Rushton 1990d, in press; Valen 1974. Brain size is, however, not necessarily wholly determined by the genes; it could also be associated with nutrition or general health.

32 The Rushton controversy has unfolded in a rapidly expanding scholarly literature. Some of the papers, pro and con, are Cain and Vanderwolf 1990; Lynn 1989b; Roberts and Gabor 1990; Rushton 1985, 1987, 1988, 1990a, 1990b, 1990c, 1990d, 1991a, 1991b; Rushton and Bogaert, 1978,1988; Silverman 1990; Weitzmann et al. 1990; Zuckerman and Brody 1988. For further substantiation of some of the race differences that Rushton invokes, see Ellis and Nyborg 1992; Lynn 1990c; Mangold and Powell-Griner 1991; Rowe, Rodgers, and Meseck-Bushey 1988; Valen 1974.

33 Almost as all-encompassing a thesis as Rushton’s is Richard Lynn’s account of the evolution of racial differences in intelligence in terms of the ancestral migrations of groups of early hominids from the relatively benign environments of Africa to the harsher and more demanding Eurasian latitudes (Lynn 1991c), where they branched into the Caucasoids and Mongoloids. Such theories were not uncommon among anthropologists and biologists of a generation or two ago (e.g., Darlington 1969). As the biological outlook on human behavior became controversial, this kind of theorizing has almost vanished. The modern version relies much more on psychological measurements of contemporary populations than the earlier version.

Appendix 7

1 For a comprehensive discussion, see Epstein 1992.

2 Any one of these court cases may involve heroic efforts: “Some courts have expressed concern at the spectacle of trials lasting for weeks, following years of discovery, and involving a multitude of statistical and other experts and seemingly endless testimony about the credentials of a single [job] candidate.” Bartholet 1982, p. 1002.

3 Quoted in Patterson 1989, p. 87.

4 Patterson 1989.

5 Patterson 1989.

6 401 U.S. 424 (1971).

7 Lynch 1991; Murray 1984; Patterson 1989.

8 For a clear account, see Patterson 1989.

9 401 U.S. 432.

10 Ibid.

11 There is good evidence that the Duke Power Co. had no discriminatory intent in using the test or the educational credential; it was using the same criteria at a time when it was frankly pursuing a race-segregationist hiring policy. This earlier conduct gives credence to its claim that it wanted to improve its employees’ intellectual level.

12 Some legal scholars criticize the Court for not having interpreted the Constitution itself, in the Fourteenth Amendment, as providing protection against disparate impact (e.g., Tribe 1988).

13 Ironically, the particular wording in the relevant part of Title VII was an accommodation to one of the act’s most uneasy opponents, Senator John Tower of Texas, who was concerned that the law not be used in precisely the manner that, in Griggs, the court ruled that it should be used (Wilson 1972).

14 For an excellent discussion, see Espstein 1992, whose reading of the record strongly confirms ours. Epstein makes the point that had the Congress known in 1964 what interpretation the Court was to place on Title VII in Griggs, it “would have gone down to thundering defeat” (p. 197). From the legislative record, that appears to us to be a fair assessment.

15 Quoted in Wilson 1972, pp. 854ff.

16 Quotes attributed to S. Rep. 92-415, 92d Cong., 1st sess. 5 (1971), the report of the Senate Committee on Labor and Public Welfare, in Patterson 1989.

17 Wilson 1972.

18 Bartholet 1982, p. 958.

19 422 U.S. 405 (1975).

20 Our discussion here has drawn on Braun 1992.

21 Courts other than the Supreme Court have imposed on the employer itself the burden of seeking less discriminatory alternatives (Patterson, 1989).

22 For references to the relevant government documents, see Patterson 1989.

23 For a similar conclusion, and some detail to back it up, see Potter 1986.

24 490 U.S. 642 (1989).

25 490 U.S. 659.

26 Cathcart and Snyderman 1992.