The Economic Pressure to Partition - The Emergence of a Cognitive Elite - The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray

The Bell Curve: Intelligence and Class Structure in American Life - Richard J. Herrnstein, Charles Murray (1996)

Part I. The Emergence of a Cognitive Elite

Chapter 3. The Economic Pressure to Partition

What accounts for the way that people with different levels of IQ end up in different occupations? The fashionable explanation has been education. People with high SAT scores get into the best colleges; people with the high GRE, MCAT, or LSAT test scores get into professional and graduate schools; and the education defines the occupation. The SAT score becomes unimportant once the youngster has gotten into the right college or graduate school.

Without doubt, education is part of the explanation; physicians need a high IQ to get into medical school, but they also need to learn the material that medical school teaches before they can be physicians. Plenty of hollow credentialing goes on as well, if not in medicine then in other occupations, as the educational degree becomes a ticket for jobs that could be done just as well by people without the degree.

But the relationship of cognitive ability to job performance goes beyond that. A smarter employee is, on the average, a more proficient employee. This holds true within professions: Lawyers with higher IQs are, on the average, more productive than lawyers with lower IQs. It holds true for skilled blue-collar jobs: Carpenters with high IQs are also (on average) more productive than carpenters with lower IQs. The relationship holds, although weakly, even among people in unskilled manual jobs.

The magnitude of the relationship between cognitive ability and job performance is greater than once thought. A flood of new analyses during the 1980s established several points with large economic and policy implications:

Test scores predict job performance because they measure g, Spearman’s general intelligence factor, not because they identify “aptitude” for a specific job. Any broad test of general intelligence predicts proficiency in most common occupations, and does so more accurately than tests that are narrowly constructed around the job’s specific tasks.

The advantage conferred by IQ is long-lasting. Much remains to be learned, but usually the smarter employee tends to remain more productive than the less smart employee even after years on the job.

An IQ score is a better predictor of job productivity than a job interview, reference checks, or college transcript.

Most sweepingly important, an employer that is free to pick among applicants can realize large economic gains from hiring those with the highest IQs. An economy that lets employers pick applicants with the highest IQs is a significantly more efficient economy. Herein lies the policy problem: Since 1971, Congress and the Supreme Court have effectively forbidden American employers from hiring based on intelligence tests. How much does this policy cost the economy? Calculating the answer is complex, so estimates vary widely, from what one authority thinks was a lower-bound estimate of $80 billion in 1980 to what another authority called an upper-bound estimate of $13 billion for that year.

Our main point has nothing to do with deciding how large the loss is or how large the gain would be if intelligence tests could be freely used for hiring. Rather, it is simply that intelligence itself is importantly related to job performance. Laws can make the economy less efficient by forbidding employers to use intelligence tests, but laws cannot make intelligence unimportant.

To this point in the discussion, the forces that sort people into jobs according to their cognitive ability remain ambiguous. There are three main possibilities, hinted at in the previous chapter but not assessed.

The first is the standard one: IQ really reflects education. Education imparts skills and knowledge—reading, writing, doing arithmetic, knowing some facts. The skills and knowledge are valuable in the workplace, so employers prefer to hire educated people. Perhaps IQ, in and of itself, has something to do with people’s performance at work, but probably not much. Education itself is the key. More is better, for just about everybody, to just about any level.

The second possibility is that IQ is correlated with job status because we live in a world of artificial credentials. The artisan guilds of old were replaced somewhere along the way by college or graduate degrees. Most parents want to see their children get at least as much education as they got, in part because they want their children to profit from the valuable credentials. As the society becomes richer, more children get more education. As it happens, education screens for IQ, but that is largely incidental to job performance. The job market, in turn, screens for educational credentials. So cognitive stratification occurs in the workplace, but it reflects the premium put on education, not on anything inherent in either education or cognitive ability itself.

The third possibility is that cognitive ability itself—sheer intellectual horsepower, independent of education—has market value. Seen from this perspective, the college degree is not a credential but an indirect measure of intelligence. People with college degrees tend to be smarter than people without them and, by extension, more valuable in the marketplace. Employers recruit at Stanford or Yale not because graduates of those schools know more than graduates of less prestigious schools but for the same generic reason that Willie Sutton gave for robbing banks. Places like Stanford and Yale are where you find the coin of cognitive talent.

The first two explanations have some validity for some occupations. Even the brightest child needs formal education, and some jobs require many years of advanced training. The problem of credentialing is widespread and real: the B. A. is a bogus requirement for many management jobs, the requirement for teaching certificates often impedes hiring good teachers in elementary and secondary schools, and the Ph.D. is irrelevant to the work that many Ph.D.s really do.

But whatever the mix of truth and fiction in the first two explanations, the third explanation is almost always relevant and almost always ignored. The process described in the previous chapter is driven by a characteristic of cognitive ability that is at once little recognized and essential for understanding how society is evolving: intelligence is fundamentally related to productivity. This relationship holds not only for highly skilled professions but for jobs across the spectrum. The power of the relationship is sufficient to give every business some incentive to use IQ as an important selection criterion.

That in brief is the thesis of the chapter. We begin by reviewing the received wisdom about the links between IQ and success in life, then the evidence specifically linking cognitive ability to job productivity.

THE RECEIVED WISDOM

“Test scores have a modest correlation with first-year grades and no correlation at all with what you do in the rest of your life,” wrote Derek Bok, then president of Harvard University, in 1985, referring to the SATs that all Harvard applicants take.1 Bok was poetically correct in ways that a college president understandably wants to emphasize. A 17-year-old who has gotten back a disappointing SAT score should not think that the future is bleak. Perhaps a freshman with an SAT math score of 500 had better not have his heart set on being a mathematician, but if instead he wants to run his own business, become a U.S. senator, or make a million dollars, he should not put aside those dreams because some of his friends have higher scores. The link between test scores and those achievements is dwarfed by the totality of other characteristics that he brings to his life, and that’s the fact that individuals should remember when they look at their test scores. Bok was correct in that, for practical purposes, the futures of most of the 18-year-olds that he was addressing are open to most of the possibilities that attract them.

President Bok was also technically correct about the students at his own university. If one were to assemble the SATs of the incoming freshmen at Harvard and twenty years later match those scores against some quantitative measure of professional success, the impact could be modest, for reasons we shall discuss. Indeed, if the measure of success was the most obvious one, cash income, then the relationship between IQ and success among Harvard graduates could be less than modest; it could be nil or even negative.2

Finally, President Bok could assert that test scores were meaningless as predictors of what you do in the rest of your life without fear of contradiction, because he was expressing what “everyone knows” about test scores and success. The received wisdom, promulgated not only in feature stories in the press but codified in landmark Supreme Court decisions, has held that, first of all, the relation between IQ scores and job performance is weak, and, second, whatever weak relationship there is depends not on general intellectual capacity but on the particular mental capacities or skills required by a particular job.3

There have been several reasons for the broad acceptance of the conclusions President Bok drew. Briefly:

A Primer on the Correlation Coefficient

We have periodically mentioned the “correlation coefficient” without saying much except that it varies from −1 to +1. It is time for a bit more detail, with even more to be found in Appendix 1. As in the case of standard deviations, we urge readers who shy from statistics to take the few minutes required to understand the concept. The nature of “correlation” will be increasingly important as we go along.

A correlation coefficient represents the degree to which one phenomenon is linked to another. Height and weight, for example, have a positive correlation (the taller, the heavier, usually). A positive correlation is one that falls between zero and +1, with +1 being an absolutely reliable, linear relationship. A negative correlation falls between O and −1, with −1 also representing an absolutely reliable, linear relationship, but in the inverse direction. A correlation of O means no linear relationship whatsoever.4

A crucial point to keep in mind about correlation coefficients, now and throughout the rest of the book, is that correlations in the social sciences are seldom much higher than .5 (or lower than −.5) and often much weaker—because social events are imprecisely measured and are usually affected by variables besides the ones that happened to be included in any particular body of data. A correlation of .2 can nevertheless be “big” for many social science topics. In terms of social phenomena, modest correlations can produce large aggregate effects. Witness the prosperity of casinos despite the statistically modest edge they hold over their customers.

Moderate correlations mean many exceptions. We all know people who do not seem all that smart but who handle their jobs much more effectively than colleagues who probably have more raw intelligence. The correlations between IQ and various job-related measures are generally in the .2 to .6 range. Throughout the rest of the book, keep the following figure in mind, for it is what a highly significant correlation in the social sciences looks like. The figure uses actual data from a randomly selected 1 percent of a nationally representative sample, using two variables that are universally acknowledged to have a large and socially important relationship, income and education, with the line showing the expected change in income for each increment in years of education.5 For this sample, the correlation was a statistically significant .33, and the expected value of an additional year of education was an additional $2,800 in family income—a major substantive increase. Yet look at how numerous are the exceptions; note especially how people with twelfth-grade educations are spread out all along the income continuum. For virtually every topic we will be discussing throughout the rest of the book, a plot of the raw data would reveal as many or more exceptions to the general statistical relationship, and this must always be remembered in trying to translate the general rule to individuals.

The variation among individuals that lies behind a significant correlation coefficient

Imag

The exceptions associated with modest correlations mean that a wide range of IQ scores can be observed in almost any job, including complex jobs such as engineer or physician, a fact that provides President Bok and other critics of the importance of IQ with an abundant supply of exceptions to any general relationship. The exceptions do not invalidate the importance of a statistically significant correlation.

Restriction of range. In any particular job setting, there is a restricted range of cognitive ability, and the relationship between IQ scores and job performance is probably very weak in that setting. Forget about IQ for a moment and think about weight as a qualification for being an offensive tackle in the National Football League. The All-Pro probably is not the heaviest player. On the other hand, the lightest tackle in the league weighs about 250 pounds. That is what we mean by restriction of range. In terms of correlation coefficients, if we were to rate the performance of every NFL offensive tackle and then correlate those ratings with their weights, the result would probably be a correlation near zero. Should we then approach the head coaches of the NFL and recommend that they try out a superbly talented 150-pound athlete at offensive tackle? The answer is no. We would be right in concluding that performance does not correlate much with weight among NFL tackles, whose weights range upward from around 250, but not about the correlation in the general population. Imagine a sample of ordinary people drawn from the general population and inserted into an offensive line. The correlation between the performance of these people as tackles in football games and their weights would be large indeed. The difference between these two correlations—one for the actual tackles in the NFL and the other a hypothetical one for people at large—illustrates the impact of restriction of range on correlation coefficients.6

Confusion between a credential and a correlation. Would it be silly to require someone to have a minimum score on an IQ test to get a license as a barber? Yes. Is it nonetheless possible that IQ scores are correlated with barbering skills? Yes. Later in the chapter, we discuss the economic pros and cons of using a weakly correlated score as a credential for hiring, but here we note simply that some people confuse a well-founded opposition to credentialing with a less well-founded denial that IQ correlates with job performance.7

The weaknesses of individual studies. Until the last decade, even the experts had reason to think that the relationship must be negligible. Scattered across journals, books, technical reports, conference proceedings, and the records of numberless personnel departments were thousands of samples of workers for whom there were two measurements: a cognitive ability test score of some sort and an estimate of proficiency or productivity of some sort. Hundreds of such findings were published, but every aspect of this literature confounded any attempt to draw general conclusions. The samples were usually small, the measures of performance and of worker characteristics varied and were more or less unreliable and invalid, and the ranges were restricted for both the test score and the performance measure. This fragmented literature seemed to support the received wisdom: Tests were often barely predictive of worker performance and different jobs seemed to call for different predictors. And yet millions of people are hired for jobs every year in competition with other applicants. Employers make those millions of choices by trying to guess which will be the best worker. What then is a fair way for the employer to make those hiring decisions?

Since 1971, the answer to that question has been governed by a landmark Supreme Court decision, Griggs v. Duke Power Co.8 The Court held that any job requirement, including a minimum cutoff score on a mental test, must have a “manifest relationship to the employment in question” and that it was up to the employer to prove that it did.9 In practice, this evolved into a doctrine: Employment tests must focus on the skills that are specifically needed to perform the job in question.10 An applicant for a job as a mechanic should be judged on how well he does on a mechanical aptitude test, while an applicant for a job as a clerk should be judged on tests measuring clerical skills, and so forth. So decreed the Supreme Court, and why not? In addition to the expert testimony before the Court favoring it, it seemed to make good common sense.

THE RECEIVED WISDOM OVERTURNED

The problem is that common sense turned out to be wrong. In the last decade, the received wisdom has been repudiated by research and by common agreement of the leading contemporary scholars.11 The most comprehensive modern surveys of the use of tests for hiring, promotion, and licensing, in civilian, military, private, and government occupations, repeatedly point to three conclusions about worker performance, as follows.

1. Job training and job performance in many common occupations are well predicted by any broadly based test of intelligence, as compared to narrower tests more specifically targeted to the routines of the job. As a corollary: Narrower tests that predict well do so largely because they happen themselves to be correlated with tests of general cognitive ability.

2. Mental tests predict job performance largely via their loading on g.

3. The correlations between tested intelligence and job performance or training are higher than had been estimated prior to the 1980s. They are high enough to have economic consequences.

We state these conclusions qualitatively rather than quantitatively so as to span the range of expert opinion. Whereas experts in employee selection accept the existence of the relationship between cognitive ability and job performance, they often disagree with each other’s numerical conclusions. Our qualitative characterizations should be acceptable to those who tend to minimize the economic importance of general cognitive ability and to those at the other end of the range.12

Why has expert opinion shifted? The answer lies in a powerful method of statistical analysis that was developing during the 1970s and came of age in the 1980s. Known as meta-analysis, it combines the results from many separate studies and extracts broad and stable conclusions.13 In the case of job performance, it was able to combine the results from hundreds of studies. Experts had long known that the small samples and the varying validities, reliabilities, and restrictions of range in such studies were responsible to some extent for the low, negligible, or unstable correlations. What few realized was how different the picture would look when these sources of error and underestimation were taken into account through meta-analysis.14 Taken individually, the studies said little that could be trusted or generalized; properly pooled, they were full of gold. The leaders in this effort—psychologists John Hunter and Frank Schmidt have been the most prominent—launched a new epoch in understanding the link between individual traits and economic productivity.

THE LINK BETWEEN COGNITIVE ABILITY AND JOB PERFORMANCE

We begin with a review of the evidence that an important statistical link between IQ and job performance does in fact exist. In reading the discussion that follows, remember that job performance does vary in the real world, and the variations are not small. Think of your own workplace and of the people who hold similar jobs. How large is the difference between the best manager and the worst? The best and worst secretary? If your workplace is anything like ours have been, the answer is that the differences are large indeed. Outside the workplace, what is it worth to you to have the name of a first-rate plumber instead of a poor one? A first-rate auto mechanic instead of a poor one? Once again, the common experience is that job performance varies widely, with important, tangible consequences for our everyday lives.

Nor is variation in job performance limited to skilled jobs. Readers who have ever held menial jobs know this firsthand. In restaurants, there are better and worse dishwashers, better and worse busboys. There are better and worse ditch diggers and garbage collectors. People who work in industry know that no matter how apparently mindless a job is, the job can still be done better or worse, with significant economic consequences. If the consequences are significant, it is worth knowing what accounts for the difference.

Job performance may be measured in many different ways.15 Sometimes it is expressed as a natural quantitative measure (how many units a person produces per hour, for example), sometimes as structured ratings by supervisors or peers, sometimes as analyses of a work sample. When these measures of job productivity are correlated with measures of intelligence, the overall correlation, averaged over many tests and many jobs, is about .4. In the study of job performance and tests, the correlation between a test and job performance is usually referred to as the validity of the test, and we shall so refer to it for the rest of the discussion.16 Mathematically, validity and the correlation coefficient are identical. Later in the chapter we will show that a validity of .4 has large economic implications, and even validities half as large may warrant worrying about.

This figure of .4 is no more than a point of reference. As one might expect, the validities are higher for complex jobs than for simple ones. In Edwin Ghiselli’s mammoth compilation of job performance studies, mostly from the first half of the century, a reanalysis by John Hunter found a mean validity of .53 for the job family labeled “manager” and .46 for a “trades and crafts worker.” Even an “elementary industrial worker” had a mean validity of 37.17

The Ghiselli data were extremely heterogeneous, with different studies using many different measures of cognitive ability, and include data that are decades old. A more recent set of data is available from a meta-analysis of 425 studies of job proficiency as predicted by the General Aptitude Test Battery (GATB), the U.S. Labor Department’s cognitive ability test for the screening of workers. The table below summarizes the results of John and Ronda Hunter’s reanalysis of these databases.18

The average validity in the meta-analysis of the GATB studies was 4519 The only job category with validity lower than .40 was the industrial category of “feeding/offbearing”—putting something into a machine or taking it out—which occupies fewer than 3 percent of U.S. workers in any case. Even at that bottom-most level of unskilled labor, measured intelligence did not entirely lose its predictiveness, with a mean validity of .23.

The Validity of the GATB for Different Types of Jobs

GATB Validity for:

Job Complexity

Proficiency Ratings

Training Success

% of U.S. Workers in These Occupations

Source: Hunter and Hunter 1984, Table 2.

General job families

High (synthesizing/coordinating)

.58

.50

14.7

Medium (compiling/computing)

.51

.57

62.7

Low (comparing/copying)

.40

.54

17.7

Industrial job families

High (setup work)

.56

.65

2.5

Low (feeding/offbearing)

.23

NA

2.4

The third major database bearing on this issue comes from the military, and it is in many ways the most satisfactory. The AFQT (Armed Forces Qualification Test) is extracted from the scores on several tests that everyone in the armed forces takes. It is an intelligence test, highly loaded on g. Everyone in the military goes to training schools, and everyone is measured for training success at the end of their schooling, with “training success” based on measures that directly assess job performance skills and knowledge. The job specialties in the armed forces include most of those found in the civilian world, as well a number that are not (e.g., combat). The military keeps all of these scores in personnel files and puts them on computers. The resulting database has no equal in the study of job productivity.

We will be returning to the military data for a closer look when we turn to subjects for which they are uniquely suited. For now, we will simply point out that the results from the military conform to the results in the civilian job market. The results for training success in the four major job families are shown in the table above. These results are based on results from 828 military schools and 472,539 military personnel. The average validity was .62. They hold true for individual schools as well. Even the lowest-validity school, combat, in which training success is heavily dependent on physical skills, the validity was still a substantial .45.20

The Validity of the AFQT for Military Training

Military Job Family

Mean Validity of AFQT Score and Training Success

Source: Hunter 1985, Table 3.

Mechanical

.62

Clerical

.58

Electronic

.67

General technical

.62

The lowest modern estimate of validity for cognitive ability is the one contained in the report by a panel convened by the National Academy of Sciences, Fairness in Employment Testing.21 That report concluded that the mean validity is only about .25 for the GATB, in contrast to the Hunter estimate of .45 (which we cited earlier). Part of the reason was that the Hartigan committee (we name it for its chairman, Yale statistician John Hartigan), analyzing 264 studies after 1972, concluded that validities had generally dropped in the more recent studies. But the main source of the difference in validities is that the committee declined to make any correction whatsoever for restriction of range (see above and note 6). It was, in effect, looking at just the tackles already in the NFL; Hunter was considering the population at large. The Hartigan committee’s overriding concern, as the title of their report (Fairness in Employment Testing) indicates, was that tests not be used to exclude people, especially blacks, who might turn out to be satisfactory workers. Given that priority, the committee’s decision not to correct for restriction of range makes sense. But failing to correct for restriction of range produces a misleadingly low estimate of the overall relationship of IQ to job performance and its economic consequences.22 Had the Hartigan committee corrected for restriction of range; the estimates of the relationship would have been .35 to .40, not much less than Hunter’s.

THE REASONS FOR THE LINK BETWEEN COGNITIVE ABILITY AND JOB PERFORMANCE

Why are job performance and cognitive ability correlated? Surgeons, for example, will be drawn from the upper regions of the IQ distribution. But isn’t it possible that all one needs is “enough” intelligence to be a surgeon, after which “more” intelligence doesn’t make much difference? Maybe small motor skills are more important. And yet “more” intelligence always seems to be “better,” for large groups of surgeons and every other profession. What is going on that produces such a result?

Specific Skills or g?

As we begin to explore this issue, the story departs more drastically from the received wisdom. One obvious, commonsense explanation is that an IQ test indirectly measures how much somebody knows about the specifics of a job and that that specific knowledge is the relevant thing to measure. According to this logic, more general intellectual capacities are beside the point. But the logic, however commonsensical, is wrong. Surprising as it may seem, the predictive power of tests for job performance lies almost completely in their ability to measure the most general form of cognitive ability, g, and has little to do with their ability to measure aptitude or knowledge for a particular job.

SPECIFIC SKILLS VERSUS G IN THE MILITARY. The most complete data on this issue come from the armed services, with their unique advantages as an employer that trains hundreds of thousands of people for hundreds of job specialties. We begin with them and then turn to the corresponding data from the civilian sector.

In assigning recruits to training schools, the services use particular combinations of subtests from a test battery that all recruits take, the Armed Services Vocational Aptitude Battery (ASVAB).23 The Pentagon’s psychometricians have tried to determine whether there is any practical benefit of using different weightings of the subtests for different jobs rather than, say, just using the overall score for all jobs. The overall score is itself tantamount to an intelligence test. One of the most comprehensive studies of the predictive power of intelligence tests was by Malcolm Ree and James Earles, who had both the intelligence test scores and the final grades from military school for over 78,000 air force enlisted personnel spread over eighty-nine military specialties. The personnel were educationally homogeneous (overwhelmingly high school graduates without college degrees), conveniently “controlling” for educational background.24

What explains how well they performed? For every one of the eightynine military schools, the answer was g—Charles Spearman’s general intelligence. The correlations between g alone and military school grade ranged from an almost unbelievably high .90 for the course for a technical job in avionics repair down to .41 for that for a low-skill job associated with jet engine maintenance.25 Most of the correlations were above .7. Overall, g accounted for almost 60 percent of the observed variation in school grades in the average military course, once the results were corrected for range restriction (the accompanying note spells out what it means to “account for 60 percent of the observed variation”).26

Did cognitive factors other than g matter at all? The answer is that the explanatory power of g was almost thirty times greater than of all other cognitive factors in ASVAB combined. The table below gives a sampling of the results from the eighty-nine specialties, to illustrate the two commanding findings: g alone explains an extraordinary proportion of training success; “everything else” in the test battery explained very little.

The Role of g in Explaining Training Success for Various Military Specialties

Enlisted Military Skill Category

Percentage of Training Success Explained by:

g

Everything Else

Source: Ree and Earles 1990a, Table 9.

Nuclear weapons specialist

77.3

0.8

Air crew operations specialist

69.7

1.8

Weather specialist

68.7

2.6

Intelligence specialist

66.7

7.0

Fireman

59.7

0.6

Dental assistant

55.2

1.0

Security police

53.6

1.4

Vehicle maintenance

49.3

7.7

Maintenance

28.4

2.7

An even larger study, not quite as detailed, involving almost 350,000 men and women in 125 military specialties in all four armed services, confirmed the predominant influence of g and the relatively minor further predictive power of all the other factors extracted from ASVAB scores.27 Still another study, of almost 25,000 air force personnel in thirty-seven different military courses, similarly found that the validity of individual ASVAB subtests in predicting the final technical school grades was highly correlated with the g loading of the subtest.28

EVIDENCE FROM CIVILIAN JOBS. There is no evidence to suggest that military jobs are unique in their dependence on g. However, scholars in the civilian sector are at a disadvantage to their military colleagues; nothing approaches the military’s database on this topic. In one of the few major studies involving civilian jobs, performance in twenty-eight occupations correlated virtually as well with an estimate of g from GATB scores as it did with the most predictively weighted individual sub test scores in the battery.29 The author concluded that, for samples in the range of 100 to 200, a single factor, g, predicts job performance as well as, or better than, batteries of weighted subtest scores. With larger samples, for which it is possible to pick up the effect of less potent influences, there may be some modest extra benefit of specialized weighted scores. At no level of sampling, however, does g become anything less than the best single predictor known, across the occupational spectrum. Perhaps the most surprising finding has been that tests of general intelligence often do better in predicting future job performance than do contrived tests of job performance itself. Attempts to devise measures that are specifically keyed to a job’s tasks—for example, tests of filing, typing, answering the telephone, searching in records, and the like for an office worker—often yield low-validity tests, unless they happen to measure g, such as a vocabulary test. Given how pervasive g is, it is almost impossible to miss it entirely with any test, but some tests are far more efficient measures of it than others.30

Behind the Test Scores

Let us try to put these data in the framework of everyday experience. Why should it be that variation in general cognitive ability, g, is more important than job-specific skills and knowledge? We will use the job of busboy as a specific example, asking the question: At a run-of-the-mill family restaurant, what distinguishes a really good busboy from an average one?

Being a busboy is a straightforward job. The waiter takes the orders, deals with the kitchen, and serves the food while the busboy totes the dirty dishes out to the kitchen, keeps the water glasses filled, and helps the waiter serve or clear as required. In such a job, a high IQ is not required. One may be a good busboy simply with diligence and good spirits. But complications arise. A busboy usually works with more than one waiter. The restaurant gets crowded. A dozen things are happening at once. The busboy is suddenly faced with queuing problems, with setting priorities. A really good busboy gets the key station cleared in the nick of time, remembering that a table of new orders near that particular station is going to be coming out of the kitchen; when he goes to the kitchen, he gets a fresh water pitcher and a fresh condiment tray to save an extra trip. He knows which waiters appreciate extra help and when they need it. The point is one that should draw broad agreement from readers who have held menial jobs: Given the other necessary qualities of diligence and good spirits, intelligence helps. The really good busboy is engaged in using g when he is solving the problems of his job, and the more g he has, the more quickly he comes up with the solutions and can call on them when appropriate.

Now imagine devising a test that would enable an employer to choose the best busboy among applicants. One important aspect of the test would measure diligence and good spirits. Perhaps the employer should weigh the results of this part of the test more heavily than anything else, if his choice is between a diligent and cheerful applicant and a slightly smarter but sulky one. But when it comes to measuring performance in general for most applicants, it is easy to see why the results will match the findings of the literature we just discussed. Job-specific items reveal mostly whether an applicant has ever been a busboy before. But that makes very little difference to job productivity, because a bright person can pick up the basic routine in the course of a few shifts. The g-loaded items, on the other hand, will reveal whether the applicant will ever become the kind of busboy who will clear table 12 before he clears table 20 because he relates the needed task to something that happened twenty minutes earlier regarding table 15. And that is why employers who want to select productive busboys should give applicants a test of general intelligence rather than a test of busboy skills. The kind of test that would pass muster with the courts—a test of job-specific skills—is a less effective kind of test to administer. What applies to busboys applies ever more powerfully as the jobs become more complex.

DOES MORE EXPERIENCE MAKE UP FOR LESS INTELLIGENCE?

The busboy example leads to another question that bears on how we should think about cognitive ability and job productivity: How much can experience counterbalance ability? Yes, the smart busboy will be more productive than the less-smart busboy a week into the job, and, yes, perhaps there will always be a few things that the smart busboy can do that the less smart cannot. But will the initial gap in productivity narrow as the less-smart busboy gains experience? How much, and how quickly?

Separately, job performance relates to both experience and intelligence, but the relationships differ.31 That is, people who are new to a job learn quickly at first, then more slowly. A busboy who has, say, one month on the job may for that reason outperform someone who started today, but the one-month difference in experience will have ceased to matter in six months. No comparable leveling-off effect has been observed for increasing intelligence. Wherever on the scale of intelligence pairs of applicants are, the smarter ones not only will outperform the others, on the average, but the benefit of having a score that is higher by a given amount is approximately the same throughout the range. Or, to put it more conservatively, no one has produced good evidence of diminishing returns to intelligence.32

But what happens when both factors are considered jointly? Do employees of differing intelligence converge after some time on the job? If the answer were yes, then it could be argued that hiring less intelligent people imposes only a limited and passing cost. But the answer seems to be closer to no than to yes, although much remains to be learned.

Some convergence has been found when SATs are used as the measure of ability and grade point average is used as the measure of achievement.33 Students with differing SATs sometimes differ more in their freshman grades than in later years. That is why President Bok granted predictive value to the SAT only for first-year grades.34 On the other hand, the shrinking predictive power may be because students learn which courses they are likely to do well in: They drop out of physics or third-year calculus, for example, and switch to easier courses. They find out which professors are stingy with A’s and B’s. At the U.S. Military Academy, where students have very little choice in courses, there is no convergence in grades.35

When it comes to job performance, the balance of the evidence is that convergence either does not occur or that the degree of convergence is small. This was the finding of a study of over 23,000 civilian employees at three levels of mental ability (high, medium, and low), using supervisor ratings as the measure of performance, and it extended out to job tenures of twenty years and more.36 A study of four military specialties (armor repairman, armor crewman, supply specialist, cook) extending out to five years of experience and using three different measures of job performance (supervisor’s ratings, work sample, and job knowledge) found no reliable evidence of convergence.37 Still another military study, which examined several hundred marines working as radio repairmen, automotive mechanics, and riflemen, found no convergence among personnel of differing intelligence when job knowledge was the measure of performance but did find almost complete convergence after a year or so when a work sample was the measure.38

Other studies convey a similarly mixed picture.39 Some experts are at this point concluding that convergence is uncommon in the ordinary range of jobs.40 It may be said conservatively that for most jobs, based on most measures of productivity, the difference in productivity associated with differences in intelligence diminishes only slowly and partially. Often it does not diminish at all. The cost of hiring less intelligent workers may last as long as they stay on the job.

TEST SCORES COMPARED TO OTHER PREDICTORS OF PRODUCTIVITY

How good a predictor of job productivity is a cognitive test score compared to a job interview? Reference checks? College transcript? The answer, probably surprising to many, is that the test score is a better predictor of job performance than any other single measure. This is the conclusion to be drawn from a meta-analysis on the different predictors of job performance, as shown in the table below.

The Validity of Some Different Predictors of Job Performance

Predictor

Validity Predicting Job Performance Ratings

Source: Hunter and Hunter 1984.

Cognitive test score

.53

Biographical data

.37

Reference checks

.26

Education

.22

Interview

.14

College grades

.11

Interest

.10

Age

−.01

The data used for this analysis were top heavy with higher-complexity jobs, yielding a higher-than-usual validity of .53 for test scores. However, even if we were to substitute the more conservative validity estimate of .4, the test score would remain the best predictor, though with close competition from biographical data.41 The method that many people intuitively expect to be the most accurate, the job interview, has a poor record as a predictor of job performance, with a validity of only .14.

Readers who are absolutely sure nonetheless that they should trust their own assessment of people rather than a test score should pause to consider what this conclusion means. It is not that you would select a markedly different set of people through interviews than test scores would lead you to select. Many of the decisions would be the same. The results in the table say, in effect, that among those choices that would be different, the employees chosen on the basis of test scores will on average be more productive than the employees chosen on the basis of any other single item of information.

THE DIFFERENCE INTELLIGENCE MAKES

We arrive finally at the question of what it all means. How important is the overall correlation of .4, which we are using as our benchmark for the relation between intelligence and job performance? The temptation may be to say, not very. As we showed before, there will be many exceptions to the predicted productivity with correlations this modest. And indeed it is not very important when an employer needs just a few new employees for low-complexity jobs and is choosing among a small group of job applicants who have small differences in test scores. But the more reality departs from this scenario, the more important cognitive ability becomes.

The Dollar Value of Cognitive Ability

How much is the variation in job performance worth? To answer that question, we need a measure in dollars of how much the workers in a given occupation vary. (Some of the methods for making this measurement are recounted in the notes, to which we refer readers who would like more detail.)42 To cut a long story short, think now of a particular worker—a secretary, let us say. You have a choice between hiring an average secretary, who by definition is at the 50th percentile, or a first-rate one—at the 84th percentile, let us say. If you were free to set their salaries at the figures you believe to reflect their true worth, how different would they be? We imagine that anyone who has worked with average secretaries and first-rate ones will answer “a lot.” The consensus among experts has been that, measured in dollars, “a lot” works out, on the average, to about a 40 percent premium.

Put more technically and precisely, one standard deviation of the distribution of workers’ annual productivities in a typical occupation is worth 40 percent of the average worker’s annual income.43 New work suggests the premium may actually be twice as large. Since the larger estimate has yet to be confirmed, we will base our calculations on the more conservative estimate.44 To take a specific example, for a $20,000-a-year job, which is correctly priced for an average worker, the incremental value of hiring a new worker who is one standard deviation above the mean—at the 84th percentile—is $8,000 per year.45 Hiring a worker for a $20,000-a-year job who is one standard deviation below the mean—at the 16th percentile—would cost the employer $8,000 in lost output.

The standard deviation for output is usually larger for more complex jobs.46 This makes intuitive sense: an assembly-line worker can do his job well or poorly, but ordinarily the gap that separates the proficiency of the 16th and 84th percentiles of assembly-line workers is not as great measured in the dollar value of the output as the gap that separates the proficiency of the 16th and 84th percentiles of engineers. But when we match this fact against an additional fact—that engineers make a lot more money than assembly-line workers—we are faced with what is known in statistics as an interaction effect. Getting high quality for a complex job can be worth large multiples of what it is worth to get equally high quality for a simpler job.

We may make this concrete with some hypothetical calculations. Imagine a dental office, consisting of dentist and receptionist. Assume that the annual salary of an average dentist is $100,000 and that of the receptionist $25,000, and that these are correctly priced. For whatever reasons, society finds the dentist to be worth four times as much as the receptionist.47 Suppose further that you are an employer—a Health Maintenance Organization (HMO), for example—who hires both dentists and receptionists. By using a certain selection procedure, you can improve the quality of your new hirees, so that instead of hiring people who are, on average, at the 50th percentile of proficiency (which is what would happen if you picked randomly from the entire pool of receptionists and dentists looking for jobs), you instead could hire people who are, on average, at the 84th percentile. What is this screening procedure worth to you?

For the value of the output produced, we use a standard deviation of .5 of the annual income for dentists and of .15 for that of receptionists, based on values actually observed.48 The answer, given these numbers, is that it is worth $50,000 a year for the dentist and $3,750 per year for the receptionist to hire people who are one standard deviation above average in proficiency—not the ratio of four to one that separates the dentist’s wages from the receptionist’s but a ratio of more than thirteen to one.49

We are not home yet, for although we know what it is worth to hire these more proficient dentists and receptionists, we have not yet factored in the validity of the selection test. The correlation between test score and proficiency is roughly .6 for dentists and .2 for receptionists, again based on observation and approximating the top and bottom of the range illustrated in the figure below. Given that information, we may estimate the expected output difference between two dentists who score at the 50th and 84th percentiles on an intelligence test as being worth $30,000 a year.50 The corresponding difference between two receptionists who score at the 50th and 84th percentiles in intelligence is $750 a year. And this is what we meant by an “interaction effect”: the wage of the dentist is only four times that of the receptionist. But the value to the employer of hiring brighter dentists is forty times greater than the value of hiring comparably brighter receptionists.51

In a real-life situation, the value of a test (or any other selection procedure) depends on another factor: How much choice does the employer have?52 There is no point in spending money on an intelligence test if only one applicant shows up. If ten applicants show up for the job, however, a test becomes attractive. The figure below illustrates the economic benefit of testing with different levels of competition for the job (from one to fifty applicants per job) and different tests (from a very poor one with a validity of .2 to a very strong one with a validity of .6).53 If everyone is hired, then, on average, the hired person is just at the average level of proficiency, which is a standard score of 0. But as soon as even two applicants are available per position, the value of testing rises quickly. With just two applicants per position, the employer gains 16 to 48 percent in productivity, depending on the validity of the test.54 The curve quickly begins to flatten out; much of the potential value of testing has already been captured when there are three applicants per job. The figure above is an answer to those who claim that a correlation of, say, .4 is too small to bother with.55 A validity of .4 (or even .6) may be unimportant if almost all applicants are hired, but even a correlation of .2 (or still smaller) may be important if only a small proportion gets hired.

The advantages of hiring by test score

Imag

The Macroeconomic Costs of Not Testing

Since the pivotal Supreme Court decision of Griggs v. Duke Power Co. in 1971, no large American employer has been able to hire from the top down based on intelligence tests. Estimates vary widely for how much the American economy loses by not doing so, from what Hunter and Hunter conclude is a minimum loss of $80 billion in 1980 (and in 1980 dollars) to what the Hartigan committee thought was a maximum loss of $13 billion for that year.56 The wide range reflects the many imponderables in making these calculations. For one thing, many attributes of an applicant other than a test score are correlated with intelligence—educational level, for example. Schooling captures some, but not all, of the predictive value of intelligence. Or consider an employer using family connections to hire instead of tests. A bright worker is likely to have a bright sister or brother. But the average IQ score difference between siblings is eleven or twelve points, so, again, test scores would predict proficiency better than judging an applicant by the work of a brother or sister.

Modeling the economic impact of testing has additional complexities. It has been noted that the applicant pool would gradually get depleted of the top scorers when every successive employer tries to hire top down.59 As the smart people are hired and thereby removed from the applicant pool, the validity of a test for those still on the job market may change because of, for example, restriction of range. The economic benefit of using a test would then decline. But if testing tended to place the smartest people in the jobs where the test-job correlations are large, the spread of the productivity distributions is broad, the absolute levels of output value are high, and the proportions hired are small, the benefits could be huge, even if the economic effects of testing the last people in the pool are negligible. In short, figuring out the net effects of testing or not testing is no small matter. No one has yet done it conclusively.

When Only the Best Will Do

A selection ratio of one in fifty may seem unrealistic, and so it is for the run-of-the-mill job. But for the most competitive jobs, much higher ratios, up to one in several hundred, are common. Consider the handful of new openings in top law firms or for internships in the most desirable research hospitals or in the richest investment banking firms for which each year’s new graduates are competing. Many potential applicants select themselves out of the pool for those prized jobs, realizing that the openings will be filled by people with stronger credentials, but they must nevertheless be reckoned as being part of the applicant pool in order to get a realistic estimate of the importance of cognitive ability. This is again the issue exemplified by the weight of offensive tackles, discussed earlier in the chapter.

The question arises whether the employer gains much by a rigorous selection process for choosing among the people who actually do show up at the job interview. Aren’t they already so highly screened that they are, in effect, homogeneous? The answer is intimately related to the size of the stakes. When the job is in a top Wall Street firm, for example, the dollar value of output is so high that the difference between a new hiree who is two standard deviations above the mean and one who is four standard deviations above the mean on any given predictor measure can mean a huge economic difference, even though the “inferior” applicant is already far into the top few centiles in ability.

WHY PARTITIONING IS INEVITABLE

To recapitulate a complex discussion: Proficiency in most common civilian and military occupations can be predicted by IQ, with an over-all validity that may conservatively be placed at .4. The more demanding a job is cognitively, the more predictive power such a test has, but no common job is so undemanding that the test totally lacks predictiveness. For the job market as a whole, cognitive ability predicts proficiency better than any other known variable describing an individual, including educational level. Intelligence tests are usually more predictive of proficiency than are paper-and-pencil tests that are specifically based on a job’s activities. For selecting large numbers of workers, there may be some added predictive power, usually small, when a score on a narrower test of performance is combined with an intelligence test. For low-complexity jobs, a test of motor skill often adds materially to predictiveness. The predictive power of IQ derives from its loading on g, in Spearman’s sense of general intelligence.

Choosing Police Applicants by IQ

A case study of what happens when a public service is able to hire from the top down on a test of cognitive ability, drawing on a large applicant pool, comes out of New York City. In April 1939, after a decade of economic depression, New York City attracted almost 30,000 men to a written and physical examination for 300 openings in the city’s police force, a selection ratio of approximately one in a hundred:57 The written test was similar to the intelligence test then being given by the federal civil service. Positions were offered top down for a composite score on the mental and physical tests, with the mental test more heavily weighted by more than two to one. Not everyone accepted the offer, but, times being what they were, the 300 slots were filled by men who earned the top 350 scores. Inasmuch as the performance of police officers has been shown to correlate significantly with scores on intelligence tests,58 this group of men should have made outstanding policemen. And they did, achieving extraordinarily successful careers in and out of policing. They attained far higher than average rank as police officers. Of the entire group, four have been police chiefs, four deputy commissioners, two chiefs of personnel, one a chief inspector, and one became commissioner of the New York Police Department. They suffered far fewer disciplinary penalties, and they contributed significantly to the study and teaching of policing and law enforcement. Many also had successful careers as lawyers, businessmen, and academics after leaving the police department.

If we were writing a monograph for personnel managers, the appropriate next step would be to present a handbook of tables for computing when it makes economic sense to test new applicants (ignoring for the moment legislative and judicial restrictions on such testing). Such a calculation would be based on four variables: the predictive power of the test for the job at hand, the variation in worker productivity for the job at hand, the proportion of job applicants that are to be selected, and the cost of testing. The conclusion would often be that testing is profitable. Even a marginally predictive test can be economically important if only a small fraction of applicants is to be selected. Even a marginally predictive test may have a telling economic impact if the variation in productivity is wide. And for most occupations, the test is more than marginally predictive. In the average case, a test with a .4 validity, the employer who uses a cognitive test captures 40 percent of the profit that would be realized from a perfectly predictive test—no small advantage. In an era when a reliable intelligence test can be administered in twelve minutes, the costs of testing can be low—lower in terms of labor than, for example, conducting an interview or checking references.

We are not writing a monograph for personnel managers, however, and the main point has nothing to do with whether one favors or opposes the use of tests as a hiring device. The main point is rather that intelligence itself is importantly related to job performance. Getting rid of intelligence tests in hiring—as policy is trying to do—will not get rid of the importance of intelligence. The alternatives that employers have available to them—biographical data, reference checks, educational record, and so forth—are valid predictors of job performance in part because they imperfectly reflect something about the applicant’s intelligence. Employers who are forbidden to obtain test scores nonetheless strive to obtain the best possible work force, and it so happens that the way to get the best possible work force, other things equal, is to hire the smartest people they can find. It is not even necessary for employers to be aware that intelligence is the attribute they are looking for. As employers check their hiring procedures against the quality of their employees and refine their procedures accordingly, the importance of intelligence in the selection process converges on whatever real importance it has for the job in question, whether or not they use a formal test.

Because the economic value of their employees is linked to intelligence, so ultimately are their wages. Let us consider that issue in the next chapter, along with some others that have interlocking implications as we try to foresee, however dimly, what the future holds for the cognitive elite.