Unweaving the Rainbow: Science, Delusion and the Appetite for Wonder - Richard Dawkins (2000)
Chapter 5. BARCODES AT THE BAR
And he said, Woe unto you also, ye lawyers! for ye lade men with burdens grievous to be borne, and ye yourselves touch not the burdens with one of your fingers.... Woe unto you, lawyers!for ye have taken away the key of knowledge: ye entered not in yourselves, and them that were entering in ye hindered.
On the face of it, the law may seem about as far as you can get from poetry or the wonder of science. Perhaps there is poetic beauty in the abstract ideas of justice or fairness, but I doubt if many lawyers are moved by it. In any case, that is not what this chapter is about. I shall be looking at an example of the role of science in the law: at a different aspect of science and its importance in society; a sense in which scientific understanding may become a valuable part of good citizenship. In courts of law, juries are increasingly asked to understand evidence which the lawyers themselves may not fully comprehend. Evidence from the unweaving of DNA—what we shall come to see as barcodes in the blood—is the outstanding example, and it is the main subject of this chapter. But it is not just facts about DNA that scientists can contribute. More importantly, it is the underlying theory of probability and statistics; it is scientific ways of making inferences that need to be brought to bear. Such matters stretch beyond the narrow subject of DNA evidence.
I am told on good authority that defence lawyers in the United States sometimes object to jury candidates on the grounds that they have had a scientific education. What can this mean? I would not question the right of defence lawyers to disallow the selection of particular jurors. A juror may be prejudiced against the race or class to which the defendant belongs. It is obviously undesirable that a raving homophobe should try a case of anti-homosexual violence. It is for this kind of reason that defence lawyers in some countries are allowed to cross-examine potential jurors and strike them off the list. In the USA lawyers can be completely blatant about their criteria for jury selection. A colleague tells me of a time when he was up for selection to a jury, on an injury litigation case. The lawyer asked, 'Would anyone here have a problem awarding a substantial amount of money to my client, perhaps in the millions?'
A lawyer can also disqualify a juror without giving reasons. Although this may be just, the only time I have seen it happen it misfired. I was a member of a panel of 24 individuals from which juries of 12 were to be selected. I had already participated in two juries with members of this panel, and I knew their individual foibles. One particular man was cast-iron prosecution fodder; he would take the same hard line almost regardless of the particular case. The defence lawyer waved him through like a breeze. The next one up, a large middle-aged woman, was the opposite: a guaranteed softie, a pure gift to the defence. But her appearance perhaps suggested the opposite, and it was against her that the defence lawyer chose to exercise his right of veto. I have never forgotten the look of wounded hurt on her face as, with a cutting movement of the hand, learned counsel struck her—whom he little knew could have been his secret weapon—out of the jury box.
But, to repeat the astonishing fact, lawyers in the United States have been known to use the following reason for striking down potential jurors: the prospective juror is well educated in science, or has some knowledge of genetics or probability theory. What is the problem? Are geneticists known to harbour deep-seated prejudices against certain sections of society? Are mathematicians especially likely to be of the 'flog 'em ... string 'em up ... it's the only language they understand ... law and order' persuasion? Of course not. Nobody has ever claimed such a thing.
The lawyers' objections are more ignobly based. There is a new kind of evidence increasingly coming into the criminal courts: evidence from DNA fingerprinting, and it is extremely powerful. If your client is innocent, DNA evidence may well provide a knock-down convincing way to establish his innocence. Conversely, if he is guilty, DNA evidence has a good chance of establishing his guilt in cases where no other evidence can. DNA evidence is quite hard to understand at the best of times. There are controversial aspects of it which are even harder. In these circumstances, you would think that an honest lawyer who wishes to see justice done would welcome jurors capable of grasping the arguments. Wouldn't it be an obviously good thing to have at least one or two people in the jury room who can redress the ignorance of their baffled colleagues? What kind of a lawyer is it who prefers a jury incapable of following the case that either attorney is making?
The answer is a lawyer who is more interested in winning than in seeing justice done. A lawyer, in other words. And it seems to be a fact that advocates, of both prosecution and defence, frequently disallow individual jurors specifically because they are educated in science.
Courts of law have always needed to establish individual identity. Was the individual seen hurrying from the scene Richard Dawkins? Is the hat dropped at the scene of the crime his hat? Are those his fingerprints on the weapon? A yes answer to one of these questions does not by itself prove his guilt, but it is certainly an important factor to be taken into account. Most of us, including most jurors and lawyers, have an intuitive sense that there is something specially reliable about eye-witness evidence. In this we are almost certainly wrong, but the error is a pardonable one. It may even be built into us by millennia of evolutionary history in which eye-witness evidence really was the most reliable. If I see a man in a red woolly hat climbing a drainpipe, you will have a hard time persuading me later that he was actually wearing a blue beret. Our intuitive biases are such that eye-witness evidence trumps all other categories. Yet numerous studies have shown that eye-witnesses, however convinced they may be, however sincere and well-meaning, frequently misremember even conspicuous details such as the colour of clothing and the number of assailants present.
When individual identification is important, for instance when a woman who has been raped is called upon to identify her attacker, courts perform a rudimentary statistical test known as the identity parade or line-up. The woman is led past a line of men, one of whom the police suspect on other grounds. The others have been pulled in off the streets or are out-of-work actors, or police officers dressed in plain clothes. If the woman picks out one of these stooges, her identification evidence is discounted. But if she picks out the man the police already suspect, her evidence is taken seriously.
Rightly so. Especially if the number of people in the identity parade is large. We are all statisticians enough to see why this is. The prior suspicion of the police must be open to doubt—otherwise there would be no point in seeking the woman's evidence at all. What impresses us is agreement between the woman's identification and the independent evidence offered by the police. If the identity parade contains only two men, the witness would have a 50 per cent chance of picking the man already suspected by the police, even if she chose at random—or if she were mistaken. Since the police might also be mistaken, this represents an unacceptably high risk of injustice. But if there are 20 men in the line, the woman has only a 1 in 20 chance of choosing, by guesswork or error, the man the police already suspect. The coincidence of her identification and the police's prior suspicion probably really means something. What is going on here is the assessment of coincidence, or the odds that something might happen by chance alone. The probability of meaningless coincidence is even less if the identity parade has 100 men, because a 1 in 100 chance of error is noticeably less than a i in 20 chance of error. The longer the line-up, the more secure the eventual conviction.
We also have an intuitive sense that the men chosen for the line-up must not look too obviously different from the suspect. If the woman originally told the police to look for a man with a beard, and the police have now arrested a bearded suspect, it is clearly unjust to stand him in a line with 19 clean-shaven men. He might as well be standing by himself. Even if the woman has said nothing about the appearance of her attacker, if the police have arrested a punk in a leather jacket it would be wrong to stand him in a line of suited accountants with furled umbrellas. In multiracial countries such considerations have added importance. Everyone understands that a black suspect should not be placed in an otherwise all-white line-up, or vice versa.
When we think about how we identify somebody, the face first leaps to mind. We are particularly good at distinguishing faces. As we shall see in another connection, we even seem to have evolved a special part of the brain set aside for the purpose, and certain kinds of brain damage disable our face-recognition faculty while leaving the rest of vision intact. In any case, faces are good for recognition because they are so variable. With the well-known exception of identical twins, you seldom meet two people whose faces are confusable. It is not totally unknown, however, and an actor can be made up to look very like somebody else. Dictators often employ doubles to perform for them when they are too busy, or to draw the fire of assassins. It has been suggested that one reason charismatic leaders so often sport moustaches (Hitler, Stalin, Franco, Saddam Hussein, Oswald Mosley) is to make it easier for doubles to impersonate them. Mussolini's shaven head perhaps served the same purpose.
Apart from identical twins, ordinary close relatives are sometimes sufficiently alike to fool people who don't know them well. (Unfortunately the story that Doctor Spooner, when Warden of my college, once stopped an undergraduate and said, 'I never can remember, is it you or your brother was killed in the war?' is probably not true, like most alleged Spoonerisms.) The resemblance of brothers and sisters, of fathers and sons, of grandparents and grandchildren, serves to remind us of the huge pool of facial variety in the general population of non-relatives.
But faces are only a special case. We are riddled with idiosyncrasies which, with sufficient training, can be used to identify individuals. I had a schoolfriend who claimed (and my spot checks confirmed it) that he could recognize any member of the 80-strong residence in which we lived purely by listening to their footsteps. I had another friend from Switzerland who claimed that when she walked into a room she could tell, by smell, which members of her circle of acquaintances had recently left the room. It is not that her colleagues didn't wash, just that she was unusually sensitive. That this is in principle possible is confirmed by the fact that police dogs can distinguish between any two human beings by smell alone, with the exception, yet again, of identical twins. As far as I know, the police haven't adopted the following technique, but I bet you could train bloodhounds to track down a kidnapped child after giving them a sample sniff of his brother. A way might even be found to use a jury of bloodhounds to decide paternity cases.
Voices are as idiosyncratic as faces, and various research teams are working on computer voice recognition systems for authenticating identity. It would be a great boon if, in the future, we could dispense with front door keys and rely on a voice-operated computer to obey our personal Open Sesame command. Handwriting is sufficiently individual for the written signature to be used as a guarantee of identity on bank cheques and important legal documents. Signatures are actually not particularly secure because they are too easily forged, but it is still impressive how recognizable handwriting can be. A promising newcomer to the list of individual 'signatures' is the iris of the eye. At least one bank is experimenting with automated iris-scanning machines as a way of verifying identity. The customer stands in front of a camera which photographs the eye, digitizes the image into what a newspaper described as 'a 256 byte human bar code'. But none of these methods of verifying human identity even comes close to the potential of DNA fingerprinting, properly applied.
It is not surprising that police dogs can smell the difference between any two humans except identical twins. Our sweat contains a complicated cocktail of proteins, and the precise details of all proteins are minutely specified by the coded DNA instructions that are our genes. Unlike handwriting and faces, which vary continuously and grade smoothly into one another, genes are digital codes, much like those used in computers. Again with the exception of identical twins, we differ genetically from all other people in discrete, discontinuous ways: an exact number of ways that you could even count if you had the patience. The DNA in each one of my cells (give or take a tiny minority of mistakes, and not including red blood cells which have lost all their DNA, or reproductive cells which contain a random half of my genes) is identical to the DNA in all my other cells. It differs from the DNA in every one of your cells, not in some vague, impressionistic way but at a precise number of locations dotted along the billions of DNA letters that we both have.
It is almost impossible to exaggerate the importance of the digital revolution in molecular genetics. Before Watson and Crick's epochal announcement in 1953 of the structure of DNA, it was still possible to agree with the concluding words of Charles Singer's authoritative A Short History of Biology, published in 1931:
... despite interpretations to the contrary, the theory of the gene is not a 'mechanist' theory. The gene is no more comprehensible as a chemical or physical entity than is the cell or, for that matter, the organism itself. Further, though the theory speaks in terms of genes as the atomic theory speaks in terms of atoms, it must be remembered that there is a fundamental distinction between the two theories. Atoms exist independently, and their properties as such can be examined. They can even be isolated Though we cannot see them, we can deal with them under various conditions and in various combinations. We can deal with them individually. Not so the gene. It exists only as a part of the chromosome, and the chromosome only as part of a cell. If I ask for a living chromosome, that is, for the only effective kind of chromosome, no one can give it to me except in its living surroundings any more than he can give me a living arm or leg. The doctrine of the relativity of functions is as true for the gene as it is for any of the organs of the body. They exist and function only in relation to other organs. Thus the last of the biological theories leaves us where the first started, in the presence of a power called life or psyche which is not only of its own kind but unique in each and all of its exhibitions.
This is dramatically, profoundly, hugely wrong. And it really matters. Following Watson and Crick and the revolution that they sparked, a gene can be isolated. It can be purified, bottled, crystallized, read as digitally coded information, printed on a page, fed into a computer, read out again into a test tube and reinserted into an organism where it works exactly as it did before. When the Human Genome Project, which set out to work out the complete gene sequence of a human being, is completed, probably by the year 2003, the full genome will fit comfortably on two standard CD ROM discs, leaving enough space for a textbook of molecular embryology. These two discs could then be sent into outer space, and the human race could go extinct secure in the knowledge that there is now a chance that at some future time and in some distant place, a sufficiently advanced civilization would be able to reconstitute a human being. Meanwhile, back on earth, it is because DNA is deeply and fundamentally digital—because the differences between individuals and between species can be precisely counted, not vaguely and impressionistically measured—that DNA fingerprinting is potentially so powerful.
I assert the uniqueness of each individual's DNA with confidence, but even this is only a statistical judgement. Theoretically, the sexual lottery could throw up the same genetic sequence twice. An 'identical twin' of Isaac Newton could be born tomorrow. But the number of people that would have to be born in order to make this event at all likely would be larger than the number of atoms in the universe.
Unlike our face, voice or handwriting, the DNA in most of our cells stays the same from babyhood to old age, and it cannot be altered by training or cosmetic surgery. Our DNA text has such a huge number of letters that we can precisely quantify the expected number shared by, say, brothers or first cousins as opposed to, say, second cousins or random pairs chosen from the population at large. This makes it useful not only for labelling individuals uniquely and matching them to traces such as blood or semen, but for establishing paternity and other genetic relationships. British law allows people to immigrate if they can prove that their parents are already British citizens. A number of children from the Indian subcontinent have been arrested by sceptical immigration officials. Before the advent of DNA fingerprinting it was often impossible for these unfortunate people to prove their parentage. Now it is easy. All you do is take a sample of blood from the putative parents and compare a particular set of genes with the corresponding set of genes from the child. The verdict is clear and unequivocal, with none of the doubt or fuzziness that creates a need for qualitative judgements. Several young people in Britain today owe their citizenship to DNA technology.
A similar method was used to identify skeletons discovered in Yekaterinburg and suspected of belonging to the executed Russian royal family. Prince Philip, Duke of Edinburgh, whose exact relationship to the Romanovs is known, graciously gave blood, and from this it was possible to establish that the skeletons were indeed those of the Tsar's family. In a more macabre case, a skeleton exhumed in South America was proved to belong to Doctor Josef Mengele, the Nazi war criminal known as the 'Angel of Death'. DNA taken from the bones was compared with blood from Mengele's still-living son, and the identity of the skeleton proved. More recently, a corpse dug up in Berlin has been proved, by the same method, to be that of Martin Bormann, Hitler's deputy, whose disappearance had led to endless legends and rumours and more than 6,000 'sightings' around the world.
Despite the name 'fingerprinting', our DNA, being digital, is even more individually characteristic than the patterns of whorls on our fingers. The name is appropriate because, like true fingerprints, DNA evidence is often inadvertently left behind after a person has departed the scene. DNA can be extracted from a bloodstain on a carpet, from semen inside a rape victim, from a crust of dried nasal mucus on a handkerchief, from sweat or from shed hairs. The DNA in the sample can then be compared with that in the blood taken from a suspect. It is possible to assess, to almost any desired level of probability, whether the sample belongs to a particular person or not.
So, what are the snags? Why is DNA evidence controversial? What is it about this important kind of evidence that makes it possible for lawyers to bamboozle juries into misinterpreting or ignoring it? Why have some courts been moved to the despairing extreme of ruling out this evidence altogether?
There are three major classes of potential problem, one simple, one sophisticated and one silly. I'll come to the silly problem and the more sophisticated difficulties later but first, as with any kind of evidence, there is the simple—and very important—possibility of human error. Possibilities, rather, for there are plenty of opportunities for mistakes and even sabotage. A tube of blood may be mislabelled, either by accident or in a deliberate attempt to frame somebody. A sample from the scene of a crime may be contaminated by sweat from a lab technician or a police officer. The danger of contamination is especially great in those cases where an ingenious technique of amplification called PCR (polymerase chain reaction) is used.
You can easily see why amplification might be desirable. A tiny smear of sweat on a gun butt contains precious little DNA. Sensitive though DNA analysis can be, it needs a certain minimum quantity of material to work on. The technique of PCR, invented in 1983 by the American biochemist Kary B. Mullis, is the dramatically successful answer. PCR takes what little DNA there is and produces millions of copies, multiplying again and again whatever code sequences are there. But, as always with amplification, errors are amplified along with the true signal. Stray scraps of DNA contamination from a technician's sweat are amplified as effectively as the specimen from the scene of the crime, with obvious possibilities for injustice.
But human error is not peculiar to DNA evidence. All kinds of evidence are vulnerable to bungling and sabotage, and must be handled with scrupulous care. The files in a conventional fingerprint library may be mislabelled. The murder weapon may have been touched by innocent people as well as the murderer, and their fingerprints have to be taken, along with the suspect's, for elimination purposes. Courts of law are already accustomed to the need to take all possible precautions against mistakes and they still, sometimes tragically, happen. DNA evidence is not immune to human bungling but nor is it particularly vulnerable, except in so far as PCR amplifies error. If all DNA evidence were to be thrown out because of occasional mistakes, the precedent should rule out most other kinds of evidence, too. We have to suppose that codes of practice and rigorous precautions can be developed to guard against human error in the presentation of all kinds of legal evidence.
The more sophisticated difficulties that bedevil DNA evidence will take longer to explain. They, too, have their precedents in conventional types of evidence, although this point often does not seem to be understood in law courts.
Where identification evidence of any kind is concerned, there are two types of error which correspond to the two types of error in any statistical evidence. In another chapter, we shall call them Type 1 and Type 2 errors, but it is easier to think of them as false positive and false negative. A guilty suspect may escape, through not being recognized—false negative. And—false positive (which most people would see as the more dangerous error)—an innocent suspect may be convicted because he happens, by ill luck, to resemble the genuinely guilty party. In the case of ordinary eye-witness identification, an innocent bystander who happens to look a bit like the real criminal could consequently be arrested—false positive. Identity parades are designed to make this less probable. The chance of a miscarriage of justice is inversely related to the number of people standing in the line-up. The danger can be increased in the ways we have already considered—the line-up being unfairly stacked with clean-shaven men for example.
In the case of DNA evidence the danger of a false positive conviction is theoretically very low indeed. We have a blood sample from a suspect, and we have a specimen from the scene of the crime. If the entire set of genes in both these samples could be written down, the probability of a false conviction is one in billions and billions. Identical twins apart, the chance that any two humans would match all their DNA is tantamount to zero. But unfortunately it is not practical to work out the complete gene sequence of a human being. Even after the Human Genome Project is completed, to attempt the equivalent in the solution of each crime is unrealistic. In practice, forensic detectives concentrate on small sections of the genome, preferably sections that are known to vary in the population. And now our fear must be that, although we could safely rule out misidentification if the whole genome were considered, there might be a danger of two individuals' being identical with respect to the small portion of DNA that we have time to analyse.
The probability that this would happen ought to be measurable for any particular section of the genome; we could then decide whether it was an acceptable risk. The larger the section of DNA, the smaller the probability of error, just as. in an identity parade, the longer the line-up the safer the conviction. The difference is that an identity parade, in order to compete with the DNA equivalent, would need to contain not a couple of dozen people but thousands, millions or even billions in the line. Apart from this quantitative difference, the analogy with the identity parade continues. We shall see that there is a DNA equivalent of our hypothetical line-up of clean-shaven men with one bearded suspect. But first, a little more background on DNA fingerprinting.
Obviously we sample the equivalent parts of the genome in both suspect and specimen. These parts of the genome are chosen for their tendency to vary widely in the population. A Darwinian would note that the parts that don't vary are often the parts that have an important role to play in the survival of the organism. Any substantial variations in these important genes are likely to have been removed from the population by the death of their possessors—Darwinian natural selection. But there are other parts of the genome that are very variable, perhaps because they are not important for survival. This isn't the whole story because in fact some useful genes are quite variable. The reasons for this are controversial. It's a bit of a digression but ... What is this life if, full of stress, we have no freedom to digress?
The 'neutralist' school of thought, associated with the distinguished Japanese geneticist Motoo Kimura, believes that useful genes are equally useful in a variety of different forms. This emphatically does not mean that they are useless, only that the different forms are equally good at what they do. If you think of genes as writing out their recipes in words, the alternative forms of a gene can be thought of as the very same words written in different typefaces: the meaning is the same, and the product of the recipe will come out the same. Genetic changes, 'mutations', that make no difference are not 'seen' by natural selection. They aren't mutations at all, for all the difference they make to the life of the animal, but they are potentially useful mutations from the point of view of the forensic scientist. The population ends up with lots of variety at such a locus (position in a chromosome), and this kind of variety could in principle be used for fingerprinting.
The other theory of variation, opposed to Kimura's neutral theory, believes that the different versions of the genes really do different things and that there is some special reason why both are preserved by natural selection in the population. For example, there might be two alternative forms of a blood protein, α and β, which are susceptible to two infectious diseases called alfluenza and betaccosis respectively, each being immune to the other disease. Typically, an infectious disease needs a critical density of susceptible victims in a population, otherwise an epidemic can't get going. In a population dominated by α types, there are frequent epidemics of alfluenza but not of betaccosis. So natural selection favours the β types who are immune to alfluenza. It favours them so much that after a while they come to dominate the population. Now the tables are turned. There are epidemics of betaccosis, but not of alfluenza. The a types now are favoured by natural selection because they are immune to betaccosis. The population may keep oscillating between a dominance and β dominance, or it may settle down to an intermediate mixture, an 'equilibrium'. Either way, we'll see plenty of variation at the gene locus concerned, and this is good news for the fingerprinters. The phenomenon is called 'frequency dependent selection' and it is one suggested reason for high levels of genetic variation in the population. There are others.
However, for our forensic purposes, it matters only that there are variable sections of the genome. Whatever the verdict in the controversy over whether the useful bits of the genome are variable, there are in any case lots of other regions of the genome which are never even read, or never translated into their protein equivalents. Indeed, an astonishingly high proportion of our genes seem to be doing nothing whatsoever. They are therefore free to vary, which makes them excellent DNA fingerprinting material.
As if to confirm the fact that a great deal of DNA is doing nothing useful, the sheer quantity of DNA in the cells of different kinds of organisms is wildly variable. Since DNA information is digital, we can measure it in the same kind of units as we measure computer information. One bit of information is enough to specify one yes/no decision: a 1 or a 0, a true or a false. The computer on which I am writing this has 256 megabits (32 megabytes) of core memory. (The first computer that I owned was a bigger box but had less than one five thousandth of the memory capacity.) The equivalent fundamental unit in DNA is the nucleotide base. Since there are 4 possible bases, the information content of each base is equivalent to 2 bits. The common gut bacterium Escherichia coli has a genome of 4 megabases or 8 megabits. The crested newt, Triturus cristatus, has 40,000 megabits. The 5,000-fold ratio between crested newt and bacterium is about the same as that between my present computer and my first one. We humans have 3,000 megabases or 6,000 megabits. This is 750 times as great as the bacterium (which satisfies our vanity), but what are we to make of the newt trumping us sixfold? We'd like to think that genome size is not strictly proportional to what it does: presumably quite a lot of that newt DNA isn't doing anything. This is certainly true. It is also true of most of our DNA. We know from other evidence that, of the 3,000 megabase human genome, only about 2 per cent is actually used for coding protein synthesis. The rest is often called junk DNA. Presumably the crested newt has an even higher percentage of junk DNA. Other newts have not.
The surplus of unused DNA falls into various categories. Some of it looks like real genetic information, and probably represents old, defunct genes, or out-of-date copies of genes that are still in use. These pseudo-genes would make sense if they were read and translated. But they are not read and translated. Hard disks on computers usually contain comparable junk: old copies of work in progress, scratchpad space used by the computer for interim operations, and so on. We users don't see this junk, because our computers only show us those parts of the disk that we need to know about. But if you get right down and read the actual information on the disk, byte by byte, you'll see the junk, and much of it will make some sort of sense. There are probably dozens of disjointed fragments of this very chapter peppered around my hard disk at present, although there is only one 'official' copy that the computer tells me about (plus a prudent back-up).
In addition to the junk DNA which could be read but isn't, there is plenty of junk DNA which not only isn't read but wouldn't make any sense if it were. There are huge stretches of repeated nonsense, perhaps repeats of one base, or alternations of the same two bases, or repeats of a more complicated pattern. Unlike the other class of junk DNA, we cannot account for these 'tandem repeats' as outdated copies of useful genes. This repetitive DNA has never been decoded, and presumably has never been of any use. (Never useful for the animal's survival, anyway. From the point of view of the selfish gene, as I explained in another book, we could say that any kind of junk DNA is 'useful' to itself if it just keeps surviving and making more copies of itself. This suggestion has come to be known by the catchphrase 'selfish DNA', although this is a little unfortunate because, in my original sense, working DNA is selfish too. For this reason, some people have taken to calling it 'ultraselfish DNA'.)
Anyway, whatever the reason, junk DNA is there, and there in prodigious quantities. Because it is not used, it is free to vary. Useful genes, as we have seen, are severely constrained in their freedom to change. Most changes (mutations) make a gene work less effectively, the animal dies and the change is not passed on. This is what Darwinian natural selection is all about. But mutations in junk DNA (mostly changes in the number of repeats in a given region) are not noticed by natural selection. So, as we look around the population, we find most of the variation that is useful for fingerprinting in the junk regions. As we shall now see, tandem repeats are particularly useful because they vary with respect to number of repeats, a gross feature which is easy to measure.
If it wasn't for this, the forensic geneticist would need to look at the exact sequence of bases in our sample region. This can be done, but sequencing DNA is time-consuming. The tandem repeats allow us to use cunning short-cuts, as discovered by Alec Jeffreys of the University of Leicester, rightly regarded as the father of DNA fingerprinting (and now Sir Alec). Different people have different numbers of tandem repeats in particular places. I might have 147 repeats of a particular piece of nonsense, where you have 84 repeats of the same piece of nonsense in the corresponding place in your genome. In another region, I might have 24 repeats of a particular piece of nonsense to your 38 repeats. Each of us has a characteristic fingerprint consisting of a set of numbers. Each of these numbers in our fingerprint is the number of times a particular piece of nonsense is repeated in our genome.
We get our tandem repeats from our parents. We each have 46 chromosomes, 23 from our father and 23 homologous, or corresponding, chromosomes from our mother. These chromosomes come complete with tandem repeats. Your father got his 46 chromosomes from your paternal grandparents, but he didn't pass them on to you in their entirety. Each of his mother's chromosomes was lined up with its paternal opposite number and bits were exchanged before a composite chromosome was put into the sperm that helped to make you. Every sperm and every egg is unique because it is a different mix of maternal and paternal chromosomes. The mixing process affects the tandem repeat sections as well as the meaningful sections of the chromosomes. So our characteristic numbers of tandem repeats are inherited, in much the same way as our eye colour and hair curliness are inherited. With the difference that, whereas our eye colour results from some kind of joint verdict of our paternal and our maternal genes, our tandem repeat numbers are properties of the chromosomes themselves and can therefore be measured separately for paternal and maternal chromosomes. At any particular tandem repeat region, each of us has two readings: a paternal chromosome repeat number and a maternal chromosome repeat number. From time to time, chromosomes mutate—suffer a random change—in their tandem repeat numbers. Or a particular tandem region may be split by chromosomal crossing over. This is why there is variation in tandem repeat numbers in the population. The beauty of tandem repeat numbers is that they are easy to measure. You don't have to get embroiled in detailed sequencing of coded DNA bases. You do something a bit like weighing them. Or, to take another equally apt analogy, you spread them out like coloured bands from a prism. I'll explain one way of doing this.
First you need to make some preparations. You make a so-called DNA probe, which is a short sequence of DNA that exactly matches the nonsense sequence in question—up to about 20 nucleotide bases long. This is not difficult to do nowadays. There are several methods. You can even buy a machine off the shelf which makes short DNA sequences to any specification, just as you can buy a keyboard to punch any desired string of letters on a paper tape. By supplying the synthesizing machine with radioactive raw materials, you make the probes themselves radioactive, and so 'label' them. This makes the probes easy to find again later, as natural DNA is not radioactive, and so the two are readily distinguishable from each other.
Radioactive probes are a tool of the trade, which you must have ready before you start a Jeffreys fingerprinting exercise. Another essential tool is the 'restriction enzyme'. Restriction enzymes are chemical tools that specialize in cutting DNA, but cutting it only in particular places. For example, one restriction enzyme may search the length of a chromosome until it finds the sequence GAATTC (G, C, T and A are the four letters of the DNA alphabet; all genes, from all species on earth, differ only in consisting of different sequences of these four letters). Another restriction enzyme cuts the DNA wherever it can find the sequence GCGGCCGC. A number of different restriction enzymes are available in the toolbox of the molecular biologist. They originate from bacteria, who use them for their own defensive purposes. Each restriction enzyme has its own unique search string which it homes in on and cuts.
Now, the trick is to choose a restriction enzyme whose specific search string is completely absent from the tandem repeat we are interested in. The whole length of DNA is therefore chopped into short stretches, bounded by the characteristic search string of the restriction enzyme. Of course, not all the stretches will consist of the tandem repeat we are looking for. All sorts of other stretches of DNA will happen to be bounded by the favoured search string of the restriction enzyme scissors. But some of them will consist of tandem repeats and the length of each scissored stretch will be largely determined by the number of tandem repeats in it. If I have 147 repeats of a particular piece of DNA nonsense, where you have only 83, my snipped fragments will be correspondingly longer than your snipped fragments.
We can measure these characteristic lengths using a technique that has been around in molecular biology for quite a while. This is the bit that is rather like spreading them out with a prism, as Newton did for white light. The standard DNA 'prism' is a gel electrophoresis column, that is, a long tube filled with jelly through which an electric current is passed. A solution containing the scissored stretches of DNA, all jumbled together, is poured into one end of the tube. The DNA fragments are all electrically attracted to the positive end of the column, which is at the other end of the tube, and they move steadily through the jelly. But they don't all move at the same rate. Like light of low vibration frequency moving through glass, small fragments of DNA move faster than large ones. The result is that, if you switch the current off after a suitable interval, the fragments have spread themselves out along the column, just as Newton's colours spread themselves out because light from the blue end of the spectrum is more readily slowed down by glass than light from the red end.
But so far we can't see the fragments. The jelly column looks uniform all the way down. There is nothing to show that DNA fragments of different size are lurking in discrete bands along its length, and nothing to show which bands contain which variety of tandem repeat. How do we make them visible? This is where the radioactive probes come in.
To make them visible you can use another cunning technique, the Southern blot, named after its inventor, Edward Southern. (Slightly confusingly, there are other techniques called the Northern blot and the Western blot, but no Mr Northern or Mr Western.) The jelly column is removed from the tube and laid out on blotting paper. The liquid in the jelly, including the DNA fragments, seeps out of the jelly into the blotting paper. The blotting paper has previously been laced with quantities of the radioactive probe for the particular tandem repeat that we are interested in. The probe molecules line up along the blotting paper, pairing precisely, by the ordinary rules of DNA, with their opposite numbers in the tandem repeats. Surplus probe molecules are washed away. Now the only radioactive probe molecules left in the blotting paper are those bound to their exact opposite numbers that seeped out of the jelly. The blotting paper is now placed on a piece of X-ray film, which is then marked by the radioactivity. So, what you see when you develop the film is a set of dark bands—another barcode. The final barcode pattern that we read on the Southern blot is a fingerprint for a person, in very much the same way as the Fraunhofer lines are a fingerprint for a star, or the formant lines are the fingerprint for a vowel sound. Indeed, the barcode from the blood looks very like Fraunhofer lines or formant lines.
The details of DNA fingerprinting techniques get quite complicated and I won't go much further. For instance, one strategy is to hit the DNA with lots of probes all at the same time. What you get then is a mixed bag of barcode stripes simultaneously. In extreme cases, the stripes merge into each other and all you get is one big smear with all possible sizes of DNA fragment represented somewhere in the genome. This is no good for identification purposes. At the other extreme, people use only one probe at a time looking at one genetic 'locus'. This 'single-locus fingerprinting' gives you nice clean bars like Fraunhofer lines. But only one or two bars per person. Even so, the chances of confusing people are small. This is because the characteristics we are talking about are not like 'brown eyes versus blue eyes', in which case lots of people would be the same. The characteristics we are measuring, remember, are lengths of tandem repeat fragments. The number of possible lengths is very large, so even single-locus fingerprinting is pretty good for identification purposes. Not quite good enough, however, so in practice forensic DNA fingerprinters usually use half a dozen separate probes. Now the chances of error are very low indeed. But we still need to talk about exactly how low, because people's lives or liberties might depend upon it.
First, we must return to our distinction between false positives and false negatives. DNA evidence can be used to clear an innocent suspect, or it can be made to point the finger at a guilty one. Suppose semen is recovered from the vagina of a rape victim. Circumstantial evidence leads the police to arrest a man, suspect A. Suspect A gives a blood sample and it is compared to the semen sample, using a single DNA probe to look at one tandem repeat locus. If the two are different, suspect A is in the clear. We don't even need to look at a second locus.
But what if suspect A's blood matches the semen sample at this locus? Suppose they both share the same barcode pattern, which we shall call pattern P. This is compatible with the suspect's being guilty, but it doesn't prove it. He could just happen to share pattern P with the real rapist. We must now look at some more loci. If the samples still match, what are the odds against such a match being coincidental—a false positive misidentification? This is where we have to start thinking statistically about the population at large. In theory, by taking blood from a sample of men in the population at large, we should be able to calculate the likelihood that any two men will be identical at each locus concerned. But from which section of the population do we draw our sample?
Remember our lone bearded man in the old-fashioned line-up identity parade? Here's the molecular equivalent. Suppose that, in the world at large, only one in a million men has pattern P. Does this mean that there is a million to one chance against a wrongful conviction of suspect A? No. Suspect A may belong to a minority group of people whose ancestors immigrated from a particular part of the world. Local populations often share genetic peculiarities, for the simple reason that they are descended from the same ancestors. Of the 2.5 million South African Dutch, or Afrikaners, most are descended from one shipload of immigrants who arrived from the Netherlands in 1652. As an indicator of the narrowness of this genetic bottleneck, about a million still bear the surnames of 20 of these original settlers. The Afrikaners have a much higher frequency of certain genetic diseases than the population of the world in general. According to one estimate, about 8,000 (one in 300) have the blood condition porphyria variegata, which is much rarer in the rest of the world. This is apparently because they are descended from one particular couple on the ship, Gerrit Jansz and Ariaantje Jacobs, although it is not known which one was the carrier of the (dominant) gene for the condition. (She was one of eight Rotterdam orphanage girls put on the ship to provide wives for the settlers.) In fact, the condition wasn't noticed at all before modern medicine, because its most marked symptom is a lethal reaction to certain modern anaesthetics (South African hospitals now routinely test for the gene before administering anaesthetic). Other populations often have locally high frequencies of other particular genes, for the same kind of reason. If, to return to our hypothetical court case, suspect A and the real criminal both belong to the same minority group, the likelihood of chance confusion could be dramatically greater than you'd think if you based your estimates on the population at large. The point is that the frequency of pattern P in humans at large is no longer relevant. We need to know the frequency of pattern P in the group to which the suspect belongs.
This need is nothing new. We've already seen the equivalent danger in an ordinary line-up identity parade. If the prime suspect is Chinese, it doesn't do to stand him in a line-up largely consisting of westerners. And the same kind of statistical reasoning about the background population is needed in identifying stolen goods, as well as individual suspects. I have already mentioned my jury service in the Oxford Court. In one of the three cases I sat on, a man was accused of stealing three coins from a rival numismatist. The accused had been caught with three coins in his possession which matched those lost. Counsel for the prosecution was eloquent.
Ladies and gentlemen of the jury, are we really supposed to believe that three coins, of exactly the same type as the three missing coins, would just happen to be present in the house of a rival collector? I put it to you that such a coincidence is too much to stomach.
Jurymen are not permitted to cross-examine. That was the duty of counsel for the defence, and he, though doubtless learned in the law and also eloquent, had no more clue about probability theory than the prosecutor. I wish he'd said something like this:
M'Lud, we don't know whether the coincidence is too much to stomach, because m'learned friend has not presented us with any evidence at alias to the rarity or commonness of these three coins in the population at large. If these coins are so rare that only one in a hundred collectors in the country has any one of them, the prosecution has a good case, since the defendant was caught with three of them If, on the other hand, these coins are as common as dirt, there is not enough evidence to convict (To push to the extreme, three coins that I have in my pocket today, all current legal tender, are very probably the same as three coins in Your Lordship's pocket)
My point is that it simply never occurred to any of the legally trained minds in the court that it was relevant even to ask how rare these three coins were in the population at large. Lawyers can certainly add up (I once received a lawyer's bill, the last item of which was 'Time spent making out this bill') but probability theory is another matter.
I expect the coins were actually rare. If they hadn't been, the theft would not have been such a serious matter, and the prosecution presumably would never have been brought. But the jury should have been told explicitly. I remember that the question came up in the jury room, and we wished that we were allowed to go back into the court to seek clarification. The equivalent question is equally relevant in the case of DNA evidence, and it is most certainly being asked. Fortunately, provided a sufficient number of separate genetic loci are examined, the chances of misidentification—even among members of minority groups, even among family members (except identical twins)—can be reduced to genuinely very small levels, far smaller than can be achieved by any other method of identification, including eye-witness evidence.
Exactly how small the residual possibility of error is may still be open to dispute. And this is where we come to the third category of objection to DNA evidence, the just plain silly. Lawyers are accustomed to pouncing when expert witnesses seem to disagree. If two geneticists are summoned to the stand and are asked to estimate the probability of a misidentification with DNA evidence, the first may say a 1,000,000 to one while the second may say only a 100,000 to one. Pounce. 'Aha! AHA! The experts disagree! Ladies and gentlemen of the jury, what confidence can we place in a scientific method if the experts themselves can't get within a factor of ten of one another? Obviously the only thing to do is throw the entire evidence out, lock, stock and barrel.'
But, in these cases, although geneticists may be inclined to give different weightings to imponderables such as the racial subgroup effect, any disagreement between them is only over whether the odds against a wrongful identification are hyper-mega-astronomical or just plain astronomical. The odds cannot normally be lower than thousands to one, and they may well be up in the billions. Even on the most conservative estimate, the odds against wrongful identification are hugely greater than they are in an ordinary identity parade. 'M'lud, an identity parade of only 20 men is grossly unfair on my client. I demand a line-up of at least a million men!'
Expert statisticians called to give evidence on the likelihood that a conventional 20-man identity parade could yield a false identification would also disagree among themselves. Some would give the simple answer, one in 20. Under cross-examination they would then agree that it could be one in less than 20, depending upon the nature of the variation in the line-up in relation to the features of the suspect (this was the point about the lone bearded man in the line-up). But the one thing all the statisticians would agree upon is that the odds of misidentification by sheer chance are at least one in 20. Yet lawyers and judges are normally happy to go along with ordinary identity parades in which the suspect stands in a line of only 20 men.
After reporting the throwing out of DNA evidence in a case at London's central criminal court the Old Bailey, the Independent newspaper of 12 December 1992 predicted a consequent flood of appeals. The idea is that everybody at present languishing in jail, as a result of DNA identification evidence, will now be able to appeal, citing the precedent. But the flood may be even greater than the Independent imagines because, if this throwing out of DNA evidence is really a serious precedent for anything, it will cast doubt on all cases in which the odds against a chance mistake are less than thousands to one. If a witness says she 'saw' somebody and identified him in a line-up, lawyers and juries are satisfied. But the odds of mistaken identity when the human eye is involved are far greater than when the identification is done by DNA fingerprinting. If we take the precedent seriously, it ought to mean that every convicted criminal in the country will have excellent cause to appeal on grounds of mistaken identity. Even where a suspect was seen by dozens of witnesses with a smoking gun in his hand, the odds of injustice must be greater than one in 1,000,000.
A recent highly publicized case in America, where the jury were systematically confused about DNA evidence, has also become notorious for another piece of bungled probability theory. The defendant, who was known to have beaten his wife, was on trial for finally murdering her. One of the high-profile defence team, a Harvard professor of law, advanced the following argument: Statistics show that of men who beat their wives, only one in 1,000 go on to kill them. The inference that any jury might be expected to draw (indeed, were intended to draw) is that the defendant's beating of his wife should be discounted in the murder trial. Doesn't the evidence show overwhelmingly that a wife-beater is unlikely to turn into a wife murderer? Wrong. Doctor I. J. Good, a professor of statistics, wrote to the scientific journal Nature in June 1995 to explode the fallacy. The defence lawyer's argument overlooks the additional fact that wife-killing is rare compared with wife-beating. Good calculated that if you take that minority of wives who are both beaten by their husbands and murdered by somebody, it is very likely indeed that the murderer will be the husband. This is the relevant way to calculate the odds because, in the case under discussion, the unfortunate wife had been murdered by somebody, after being beaten by her husband.
No doubt there are lawyers, judges and coroners who could benefit from a better understanding of the theory of probability. On some occasions, however, one cannot help suspecting that they understand very well and are feigning incompetence. I do not know if this was so in the case just quoted. The same suspicion is raised by Doctor Theodore Dalrymple, the (London) Spectator's acerbic medical raconteur, in this typically sardonic account, from 7 January 1995, of his being called as an expert witness in a coroner's court:
...a wealthy and successful man I knew swallowed 200 tablets and a bottle of rum. The coroner asked me whether I thought he might have taken them by accident. I was about to answer with a ringing and confident no, when the coroner made himself a little clearer: was there even a one in a million chance he had taken them by accident? 'Er, well, I suppose so,' I replied. The coroner (and the man's family) relaxed, an open verdict was returned, the family was £750,000 the richer and an insurance company the poorer by an equivalent sum, at least until it put my premium up.
The power of DNA fingerprinting is an aspect of the general power of science that makes some people fear it. It is important not to exacerbate such fears by claiming too much or trying to move too fast. Let me end this rather technical chapter by returning to society and an important and difficult decision that we must collectively make. I would normally fight shy of discussing a topical issue for fear of going out of date, or a local one for fear of being parochial, but the question of a national DNA database is starting to preoccupy most nations in their different ways, and it is bound to become more pressing in the future.
It would in theory be possible to keep a national database of DNA sequences from every man, woman and child in the country. Then, whenever a sample of blood, semen, saliva, skin or hair was found at the scene of a crime, the police would not have to locate a suspect by other means before comparing his DNA with the sample. They could simply do a computer search of the national database. The very suggestion elicits howls of protest. It would be an infringement of individual liberty. It's the thin end of the wedge. A giant step towards a police state. I have always been a little puzzled about why people automatically react so strongly against suggestions such as these. If I examine the matter dispassionately, I think that, on balance, I come out against it. But it is not something to condemn out of hand without even looking at the pros and cons. So let us do so.
If the information is guaranteed to be used only for catching criminals, it is hard to see why anybody who is not a criminal should object. I am aware that plenty of activists for civil liberties will still object in principle. But I genuinely don't understand why, unless we want to protect the rights of criminals to perform crimes without detection. I also see no good reason against a national database of conventional, inkpad fingerprints (except the practical one that, unlike with DNA, it is hard to do an automatic computer search of conventional fingerprints). Crime is a serious problem which diminishes the quality of life for everybody except the criminals (perhaps even them: presumably there is nothing to stop a burglar's house being burgled). If a national DNA database would significantly help the police to catch criminals, the objections had better be good ones to outweigh the benefits.
Here's an important caution, though, to begin with. It's one thing to use DNA evidence, or mass-screening identification evidence of any kind, to corroborate a suspicion that the police have already reached on other grounds. It's quite another matter to use it to arrest anybody in the country who matches the sample. If there is a certain low probability of coincidental resemblance between, say, a semen sample and the blood of an innocent individual, the probability that that individual will also be falsely suspected on independent grounds is obviously far lower. So the technique of simply searching the database and arresting the one person who matches the sample is significantly more likely to lead to injustice than a system which requires other grounds for suspicion first. If a sample from the scene of a crime in Edinburgh happens to match my DNA, should the police be allowed to hammer on my door in Oxford and arrest me on no other evidence? I think not, but it is worth remarking that the police already do something equivalent with facial features, when they release to the national newspapers an Identikit picture, or a snapshot taken by a witness, and invite people from all over the country to telephone them if they 'recognize' the face. Once again, we must beware of our natural tendency to trust facial recognition above all other kinds of individual identification.
Setting crime aside, there is a real danger of the information in the national DNA database falling into the wrong hands. I mean into the hands of those who wish to use it not for catching criminals but for other purposes, perhaps connected with medical insurance or blackmail. There are respectable reasons why people with no criminal intent at all might not wish their DNA profile to be known, and it seems to me that their privacy should be respected. For instance, a significant number of individuals who believe they are the father of a particular child are not. Equally, a significant number of children believe somebody to be their real father who is not. Anyone with access to the national DNA database might discover the truth, and the result could be huge emotional distress, marital breakdown, nervous breakdown, blackmail, or worse. There may be some who feel that the truth should always out, however painful, but I think a good case could be made that the sum total of human happiness would not be enhanced by a sudden outburst of revelations about everybody's true paternity.
Then there are the medical and insurance issues. The whole life insurance business depends upon the inability to forecast exactly when somebody will die. As Sir Arthur Eddington said: 'Human life is proverbially uncertain; few things are more certain than the solvency of a life-insurance company.' We all pay our premiums. Those of us who die later than expected subsidize (the heirs of) those who die earlier than expected. Insurance companies already make statistical guesses which partially subvert the system by enabling them to charge high-risk clients larger premiums. They send a doctor to listen to our hearts, take our blood pressure and investigate our smoking and drinking habits. If actuaries knew exactly when we were all going to die, life insurance would become impossible. In principle, a national DNA database, if actuaries could get their hands on it, might lead us closer to this unfortunate outcome. An extreme could be reached where the only kind of death risk that could be insured against would be pure accident.
Similarly, people screening job applicants, or applicants for places at university, could use DNA information in ways that many of us might find undesirable. Some employers already use dubious methods such as graphology (analysis of handwriting as a supposed guide to character or aptitude). Unlike the case of graphology, there is good reason to think that DNA information might be genuinely useful for judging abilities. But still, I would be one of many who would be disturbed if selection panels made use of DNA information, at least if they did so secretly.
One of the general arguments against national databases of any kind is the 'What if it fell into the hands of a Hitler?' argument. On the face of it, it is not clear how an evil government would benefit from a database of true information about people. They are so adept at using false information, one might say, why should they bother to abuse true information? In the case of Hitler, however, there is the point about his campaign against Jews and others. Although it is not true that you can recognize a Jew from his DNA, there are particular genes which are characteristic of people whose ancestors come from certain regions of, say, central Europe, and there are statistical correlations between possession of certain genes and being Jewish. It seems undeniable that, if Hitler's regime had had a national DNA database at their disposal, they would have found terrible ways to abuse it.
Are there ways to safeguard society from these potential ills, while retaining the benefit of helping to catch criminals? I'm not sure. I think it might be difficult. You could protect honest citizens against insurance companies and employers by restricting the national database to non-coding regions of the genome. The database would refer only to tandem repeat areas of the genome, not genes that actually do anything. This would prevent actuaries working out our life expectancy and talent scouts second-guessing our abilities. But it would do nothing to protect us against discovering (or against blackmailers discovering) truths about paternity that we might prefer not to know. Quite the contrary. The identification of Josef Mengele's bones from his son's blood was entirely based upon tandem repeat DNA. I see no easy answer to this objection except to say that, as DNA testing becomes easier, it will increasingly be possible to discover paternity in any case, without recourse to a national database. A man who suspects that 'his' child is not really his could already take the child's blood and have it compared with his own. He wouldn't need a national database.
Not just in courts of law, the decisions of commissions of inquiry and other bodies charged with discovering what happened in some incident or accident frequently turn upon scientific matters. Scientists are called as expert witnesses on factual matters: on the technicalities of metal fatigue, on the infectivity of mad cow disease, and so on. Then, having delivered their expertise, the scientists are dismissed so those charged with the serious business of actually making the decisions can get on with it. The implication is that scientists are good at discovering detailed facts but others, often lawyers or judges, are better qualified to integrate them and recommend what needs to be done. On the contrary, a good case can be made that scientific ways of thinking are valuable, not just for assembling the detailed facts but for reaching the final verdict. When there has been an air crash, say, or a disastrous football riot, a scientist might be better qualified to chair the inquiry than a judge, not because of what scientists know, but because of the methods they use to find things out and make decisions.
The case of DNA fingerprinting suggests that lawyers would be better lawyers, judges better judges, parliamentarians better parliamentarians and citizens better citizens if they knew more science and, more to the point, if they reasoned more like scientists. This is not only because scientists value reaching the truth above winning a case. Judges, and decision-takers in general, might be better decision-takers if they were more adept in the arts of statistical reasoning and probability assessment. This point will resurface in the next two chapters, which deal with superstition and the so-called paranormal.