When Science Goes Wrong: Twelve Tales From the Dark Side of Discovery - Simon LeVay (2008)


ON THE MORNING OF January 17, 1939, a 23-year-old graduate student named Mary Tudor set out with five colleagues from the campus of the University of Iowa at Iowa City. Their destination was the Iowa Soldiers and Sailors Orphans’ Home in the town of Davenport, 50 miles to the east on the Mississippi River. Tudor’s mission was to discover the cause of stuttering. She didn’t accomplish her mission, but the methods she used in her attempt to do so eventually became the subject of a multimillion-dollar lawsuit and fierce ethical controversy.

The experiment that Tudor planned to perform on the orphanage children was directed by her advisor, Wendell Johnson, a 32-year-old assistant professor in the Departments of Psychology, Speech, and Child Welfare at the University of Iowa. Johnson, who died in 1965, devoted his life to the study of stuttering. He was largely responsible for transforming stuttering from a risible handicap to a topic of serious academic and clinical research, and many of the leading experts in the field today are his students or ‘grandstudents’. Thus, although the particular theories he espoused may not have stood the test of time, he is revered as a founder of the field and a long-time advocate for people with speech disorders.

Johnson himself stuttered. In a 1930 book, he described how he began to stutter at the age of five, after several years of normal childhood speech. Nothing in particular seems to have provoked the onset of the disability – no physical illness, personal loss, or traumatic experience. Nor were there other stutterers in Johnson’s family. In fact, Johnson’s childhood was quite typical for a boy born into a rural Kansas household in the early 1900s.

During his first few years of stuttering, the trait caused him relatively little hardship. He was held back a year in school on account of his disability, but he was not punished for it or subjected to any unusually harsh remedies. For the most part, his family and his few playmates tolerated his stuttering amicably; they would sometimes help him out by completing the words that he stumbled on. In fact, his stuttering helped motivate him to shine in ways that compensated for the disability – in academics, sports and in jovial social interactions. Thus, paradoxically, the experience of being a person who stuttered may have been a positive factor in Johnson’s career and in his personal life.

At the age of 16, Johnson was subjected to the first serious effort to eliminate his stutter: he was sent for the summer to a residential school that offered a programme for this purpose. Johnson and his fellow students were taught to speak in a slow, drawling monotone. They were taught to speak rhythmically, while swinging Indian clubs or doing other exercises to set the rhythm. They were taught to ignore other people’s negative reactions to their stuttering. And they tried to follow the director’s exhortation to ‘Use your will power. Don’t give up. Be the master of your fate and the captain of your soul!’

And to a point, it worked. Johnson became less afraid of stuttering, and perhaps as a consequence he stuttered less or not at all – within the protective environs of the school. As soon as he re-entered the wider world, however, his stutter returned with full ferocity.

After completing high school with honours – he was class valedictorian – Johnson attended nearby McPherson College. While there, he learned that a programme of research on stuttering was being started at the University of Iowa’s Speech Clinic, and he transferred to that university in 1926. Johnson recounts how, as one of his first tasks there, he had to read aloud for five minutes before a class of students: during that time he was able to get four words out of his mouth. In spite of that inauspicious start, Johnson remained at the University of Iowa for his entire life. He obtained a BA in 1928 and a PhD in psychology in 1931, and was appointed assistant professor in 1937 and full professor in 1945.

During his early years at the university, Johnson’s advisors suspected that stuttering resulted from a developmental miswiring of the brain – specifically, an error in the assignment of functions to the left and right hemispheres of the cerebral cortex. They came up with the idea that Johnson, who to all appearances was right-handed, would be better off left-handed. For several years, therefore, Johnson was equipped with a variety of devices that prevented him from using his right hand and forced him to use his left. Other stuttering students were similarly outfitted, and they became a familiar sight on campus, their good arms bandaged or tied back as they struggled to perform life’s tasks with their clumsier arms.

This and a variety of other experiments failed to cure Johnson or his fellow students of their stutters. And, gradually, Johnson began to reject the general theory that stuttering was caused by some inborn miswiring of the brain, and to consider a very different set of ideas based on the premise that social interactions were the key to the disorder.

Most central to Johnson’s new thinking was the notion that the very act of labelling a child a stutterer might turn him or her into one. In part, this idea was influenced by Johnson’s reading of the work of the Polish-American psychologist Alfred Korzybski, founder of a field he called General Semantics. Korzybski was interested in the impact of labels on people’s perceptions of things. According to one anecdote, for example, he shared some biscuits with his students, and after they had eaten them showed them the package, which read DOG COOKIES. This caused some of the students, who had previously enjoyed the biscuits, to rush to the toilet to vomit. Interestingly, this demonstration brought up – in miniature – some of the same ethical concerns that later plagued the Tudor study.

But why would anyone label a child a stutterer if he or she did not already stutter? According to Johnson, it was because all young children mangle their speech to a certain extent. These normal childhood ‘disfluencies’ are typically ignored by parents, teachers and peers, and they disappear over time as the child’s speaking skills improve. Some parents, however, develop an inordinate concern with their children’s disfluencies, Johnson believed. They become obsessed with the notion that the child is beginning to stutter, and they communicate that obsession to the child, calling him or her a stutterer and drawing unnecessary and repeated attention to every inconsequential error of speech that the child may commit. In doing so, the parents do not merely impose on the child the identity of a stutterer, but they also inculcate in the child an intense fear of stuttering. Thus the child, in speaking, anticipates making errors, becomes increasingly tense and fights his or her own vocal organs. This inner battle leads to the syllable repetitions, prolongations and other phenomena that are the behavioural hallmarks of stuttering. Johnson’s ideas became known as the ‘diagnosogenic’ theory of stuttering, so called because the trait originates in the very act of diagnosis.

Was Johnson’s diagnosogenic theory the trigger for the Tudor study? Not according to Ehud Yairi, Professor Emeritus in the Department of Speech and Hearing Science at the University of Illinois at Urbana-Champaign. Yairi is an expert on stuttering who entered the University of Iowa as a graduate student shortly after Wendell Johnson’s death. In a 2005 article, he concluded that the diagnosogenic theory could not have been well enough established in Johnson’s mind to have been the basis for the 1939 study, because Johnson’s first published account of it appeared in 1942, three years after the Tudor study was completed. At the time of the study, Yairi argued, Johnson was still thinking in terms of neurological explanations for the disorder.

Yet other experts have reached different conclusions. One of these is Nicoline Ambrose, an associate professor in the same department as Yairi, who collaborated with him on a detailed reanalysis of the Tudor study published in 2002. When I asked Ambrose in a 2006 interview whether she thought that the Tudor study was intended as a test of Johnson’s diagnosogenic theory, she said, ‘I would basically say yes, or some earlier version of it, as it was being formulated – although I don’t believe the intent was to create stutterers, but to invoke stuttering on a temporary basis. I don’t think there was any intent to say, “Let’s see if we can create a long-term problem in these kids.”’ Another expert who has weighed in with a similar opinion is Oliver Bloodstein, a onetime student of Johnson who is now Professor Emeritus of speech at the City University of New York. Bloodstein has written that Johnson was already entertaining the central idea of the diagnosogenic theory in the years prior to Mary Tudor’s study, and it was indeed this theory that led him to initiate the study.

I recently stumbled on a little-known lecture by Johnson that he published in 1938 – one year before the Tudor study – under the title ‘The Role of Evaluation in Stuttering Behavior.’ This lecture laid out the core of the diagnosogenic theory and even claimed to provide evidence in support of it. ‘[I]n 92 per cent of the 47 child stutterers we have studied to date,’ he wrote, ‘the first order reaction was a simple, loose repetition of sound, syllable or word. When this was negatively evaluated – disapproved by the parents and then by the child – other reactions appeared in series. The higher order reactions tended to be more complex, involved more tension and more stoppages generally.’ In other words, castigating a child for run-of-the-mill disfluencies caused them to spiral into full-scale stuttering.

What’s more, the internal evidence of the Tudor study itself strongly implies that it was designed as a test of the theory. Tudor’s master’s thesis, which was based entirely on the study, was titled ‘An Experimental Study of the Effect of Evaluative Labelling on Speech Fluency.’ This is the ‘Introduction’ section, in its three-sentence entirety:

Certain published statements (Johnson, Language and Speech Hygiene) and examination of case histories suggest the possibility of regarding the diagnosis of stuttering as one of the factors responsible for the development of the disorder.

An investigation of the effects, particularly on speech fluency, of such a diagnosis is indicated from this point of view. In view of this consideration the present study has been done.

In other words, the only reason that Tudor put forward for undertaking the study was to test the diagnosogenic theory of stuttering.

In the second section of the thesis, titled ‘Problem’, Tudor stated that the study was designed to answer the following questions:

1. Will removing the label ‘stutterer’ from those who have been so labelled have any effect on their speech fluency?

2. Will endorsement of the label ‘stutterer’ previously applied to an individual have any effect on his speech fluency?

3. Will endorsement of the label ‘normal speaker’ previously applied to an individual have any effect on his speech fluency?

4. Will labelling a person, previously regarded as a normal speaker, a ‘stutterer’ have any effect on his speech fluency?

Evidently, the study was intended to test the effect of evaluative labelling – specifically, labelling as a stutterer or a normal speaker – on children’s speech. Although the written objectives do not spell out what the resulting ‘effects on speech fluency’ might be, it is reasonable to assume that they were expected to consist of the appearance or disappearance of stuttering – either of the complete phenomenon or some of its components – at least on a temporary basis, for otherwise the study does not make a great deal of sense. The use of the more general phrase ‘effects on speech fluency’ may have reflected the experimenters’ open minds about what the results of the study might be. More likely, though, it represented a wish not to spell out too baldly one of the study’s ethically-troubling goals: the attempt to elicit in normal children the very trait that had plagued Wendell Johnson for his entire life.

Looking more closely at the objectives, one can see that objectives 2 and 3 are, by themselves, pointless. No one would expect that continuing to use labels that have been previously applied to a person would have any interesting effect on their speech fluency. Evidently, these two ‘objectives’ were not really intellectual goals in themselves but were listed merely as a way of indicating the need for control groups – subjects who were not manipulated and who were therefore not expected to show any effects. It is objectives 1 and 4, which involved changing a child’s previously applied labels, that incorporated the real goals of the study.

The children at the orphanage – a mix of real orphans and children whose parents had been forced to give them up by economic necessity – were used to being treated as guinea pigs. When Jim Dyer, a reporter, interviewed one of the now-elderly subjects for a 2001 article in the San Jose Mercury News, she told him that ‘Every week, somebody else from the university would come down and start testing us for God knows what.’ There is no record of Wendell Johnson’s motive for choosing an orphanage for the study, but we may guess that it was twofold: first, the convenience of having a large, fairly homogeneous collection of children at a single location and cared for by the same staff, and second, the ease of obtaining permission for the study. It would have been much harder, one may guess, to get permission from a child’s parents, given that one possible outcome of the treatment was the development of a speech disorder.

The first day of Mary Tudor’s visit to the orphanage was devoted to the selection of children for the study. There were 10 so-called stutterers in the orphanage – children who the teachers and matrons considered to be stutterers and had labelled as such. All 10 of these children were included in the study. To balance them, Tudor and her five colleagues – fellow graduate students who were familiar with speech disorders – picked 12 children at random from the remaining population of children who had never been called stutterers by the staff. The 22 children selected for the study included both boys and girls, and their ages ranged from five to 16 years.

Each of these two groups was then further divided into two, thus providing the four subject groups needed for testing the four objectives described above. Tudor named the groups IA, IB, IIA, and IIB, but for ease of recall I’ll rename them as follows:

SN: Five children previously labelled as stutterers who were to be relabelled as normal speakers.

SS: Five children previously labelled as stutterers who would continue to be labelled as such.

NS: Six children previously labelled as normal speakers who were to be relabelled as stutterers.

NN: Six children previously labelled as normal speakers who would continue to be labelled as such.

In her thesis, Tudor maintained her subjects’ confidentiality, only referring to the individual children by code numbers. This confidentiality was breached by the Mercury News reporter Jim Dyer, however. The names of the six children in the NS group (the most ethically questionable group, consisting of normal-speaking children who were to be relabelled as stutterers) have also entered the public domain on account of the lawsuit against the state of Iowa in which they or their heirs are plaintiffs, and I will therefore use their names here. They were Norma Jean Pugh (aged five at the time), Elizabeth Ostert (nine), Clarence Fifer (11), Mary Korlaske (12), Phillip Spieker (12) and Hazel Potter (15).

The ages of these children immediately raise a significant issue with regard to the scientific value of the study. Stuttering typically begins in the preschool years; if Wendell Johnson began stuttering at five, as he related, then he was among a minority of late-onset stutterers. The children in Mary Tudor’s NS group, with the possible exception of Norma Jean Pugh, were well beyond the age at which stuttering typically develops. Thus, even if Johnson’s diagnosogenic theory were correct, Tudor’s study might have failed to validate it simply because the children had grown past the sensitive period of speech development during which they could be induced to stutter. Tudor did not discuss this issue in her thesis. It may be that she was forced to use older children because there was an insufficient number of younger ones in the orphanage. Alternatively, she may have felt compelled to use children in the same age range as those in the stuttering groups, who averaged 12 years of age.

The plan of the study was as follows. At the beginning and again the end of the study, the speech of all 22 children was to be assessed by the panel of five judges. Without knowledge of which experimental group each child belonged to, the judges would independently provide a numerical assessment of the child’s fluency and would also make a judgment as to whether the child stuttered or not. During the intervening four months, Tudor would apply labels to the children according to the groups they had been assigned to.

This is how Tudor’s thesis describes what was to be said to the children in the NS group at the beginning of the study. (Her actual words were modified to suit each child’s age and intelligence; some of the children had IQs that were well below average.)

The staff has come to the conclusion that you have a great deal of trouble with your speech. The type of interruptions which you have are very undesirable. These interruptions indicate stuttering. You have many of the symptoms of a child who is beginning to stutter. You must try to stop yourself immediately. Use your will power. Make up your mind that you are going to speak without a single interruption. It’s absolutely necessary that you do this. Do anything to keep from stuttering. Try very hard to speak fluently and evenly. If you have an interruption, stop and begin over. Take a deep breath whenever you feel you are going to stutter. Don’t ever speak unless you can do it right. You see how [the name of a child in the institution who stuttered rather severely] stutters, don’t you? Well, he undoubtedly started this very same way you are starting. Watch your speech every minute and try to do something to improve it. Whatever you do, speak fluently and avoid any interruptions whatsoever in your speech.

The children in the SN group were told the opposite – that they didn’t stutter, that any speech mistakes they made were inconsequential and that they should not worry about them. The children in the SS and NN groups were given messages consistent with their prior identities as stutterers or normal speakers respectively.

Tudor reinforced these messages on subsequent visits to the orphanage. She had eight or nine sessions with each of the children in the NS group, and three or four sessions with the children in the SN group. The thesis doesn’t mention any sessions with the children in the SS or NN groups: either she neglected to list these sessions, or perhaps she thought that their status as controls made the sessions unnecessary.

During the sessions with the children who were being relabelled as stutterers, Tudor would pick on slight speech errors that the children made in the course of their conversation and draw attention to them, saying that they were signs of stuttering and that the child should do everything in his or her power to avoid making the errors. In addition, she attempted to recruit the orphanage’s staff to help reinforce these messages. She told them that the NS and SS children were stuttering and that they should draw the children’s attention to all their speech errors. Similarly, she told the staff that the SN and NN children were not stutterers, and she asked them to ignore these children’s speech errors or to tell them that their speech was fine.

It seems that the staff didn’t cooperate in the fashion that Tudor hoped. Although a couple of the children in the NS group mentioned to her that their teachers had commented on their speech, Tudor wrote in her thesis that the staff generally didn’t follow her instructions, or only did so to a small degree. Thus, the overall amount of indoctrination that the children received was probably much less than Tudor originally desired.

Even so, the indoctrination clearly had an effect. Here is part of Tudor’s report of an interview with one of the children in the NS group, 11-year-old Clarence Fifer, on May 2 – three-and-a-half-months into the study:

‘How is your stuttering today?’

‘I don’t know.’

‘When do you seem to have the most trouble?’

‘When I’m playin’.’

‘Tell me something about it.’

‘Well, most of the time I stutter.’

‘Do the other boys notice it?’


‘Do they ever say anything?’


‘How do you know they notice it?’

‘They kinda laughed.’

‘What did you do then?’

‘Walked away.’

‘Does it bother you much?’

‘Yes, feel pretty bad.’

‘What do you do about it?’

‘Next time try to keep myself from doin’ it.’

‘How do you do that?’

‘Sometimes I take a breath.’

‘How does it feel when you speak?’

‘Kinda strain my throat.’

His speech had a breathy quality and he took a breath after every few words whether he needed it or not.

During this interview, he had 25 speech interruptions. The stuttering phenomena added to the previous list were deep inhalation, excessive exhalation and eyes closed.

Since Tudor deceived the staff in the same way that she deceived the children, the staff could not have been in a position to give any kind of informed consent to the study. Whether there was any person at the orphanage, such as its administrator, who was informed about the true purpose of the study is not stated in Tudor’s thesis. Jim Dyer, who interviewed Tudor in 2000, when she was 84-years old, wrote that Johnson obtained permission for the study from orphanage officials, but he didn’t make clear whether Johnson actually told these officials what would be done to the children. It’s possible that Johnson felt he had carte blanche to initiate any kind of study that he considered appropriate.

So what was the result of the study? What happened to the children’s speech? In his articles in the San Jose Mercury News in 2001, Dyer reported that most or all of the children in the NS group responded to being labelled as stutterers by stuttering. In doing so, Dyer said, they confirmed Wendell Johnson’s diagnosogenic theory. In addition to stuttering, Dyer reported that many of these children became withdrawn and isolated; they were reluctant to speak at all and what few words they did speak came out in single words or brief phrases rather than complete sentences.

When Dyer tracked down some of the children – now elderly adults – for his articles, they supposedly confirmed the findings of the study. Norma Jean Pugh, who was five at the time of the study and spoke normally, apparently told Dyer that she had been induced to stutter by Tudor’s experiment, and that her stutter persisted for years, gravely damaging her social relationships and her education. Now, at age 64, she was a near-total recluse. Mary Korlaske, who also spoke normally at the start of the experiment, was also induced to stutter, she told Dyer. She later got over the stuttering, but it recurred in 1999 after the death of her husband. She moved into a retirement home, where she rarely left her room. Dyer said that she stuttered when he interviewed her, although his description of her speech did not correspond closely to what a speech pathologist would call stuttering.

There is some evidence that Johnson too believed that labelling the children as stutterers caused at least some of them to stutter. In email correspondence, Johnson’s student Oliver Bloodstein told me that ‘In his lectures in the fall of 1942, Johnson made it clear that he thought the results of the Tudor study supported the diagnosogenic theory.’ Bloodstein also wrote (in a published article): ‘To the best of my recollection, he told us that one child actually did begin to stutter as a result of the procedure.’

Although Dyer visited the University of Iowa library, where Tudor’s thesis is archived, he did not say explicitly that he read the thesis, and most of his account is based on interviews with Tudor and the surviving subjects, along with readings of Tudor’s notes. (I was not able to locate Dyer for an interview.)

A totally different account of the Johnson-Tudor study was published in 2002 by Nicoline Ambrose and Ehud Yairi, the experts on stuttering at the University of Illinois. Ambrose and Yairi actually went back and read the 60-year-old typescript that was Tudor’s thesis, and what they wrote about it in the American Journal of Speech-Language Pathology contradicted the central assertion of Dyer’s articles: Tudor’s experiment, they said, did not cause any of the children to stutter.

This conclusion was based principally on the assessments of the children’s speech that were made at the beginning and end of the study by the panel of five blinded judges. Each judge independently rated the fluency of the children’s speech on a five-point scale, with 1 corresponding to the worst fluency and 5 to the best. At the beginning of the study, the average score for the children in the crucial NS group was 2.83 – roughly in the middle of the scale of fluency, rather than near 5 as one might expect. At the end of the study the average score for these children was 2.92. Statistically, the tiny shift of the average (by 0.09 units) was completely insignificant, and what’s more, it was a shift toward improved speech – the opposite of what Johnson’s theory would have predicted. The child in this group who showed the biggest shift was Mary Korlaske, who supposedly told Dyer that she was induced to stutter by the experiment. Her speech shifted by 0.8 units – in the direction of greater fluency!

The fluency scores given by the judges included sub-scores for individual kinds of disfluency, some of which (such as repetition of syllables) were symptoms of stuttering, while others were not. Even when Ambrose and Yairi looked specifically at the scores for the stuttering-related disfluencies, there was no significant change over the course of the study.

Besides the numerical score, the judges added written comments at the end of the study. For each of the five children, including Korlaske, the majority of the judges simply wrote ‘No stuttering.’ Some of the judgments included statements like ‘appeared hesitant’ or ‘answered briefly’, but not one judge stated that any child stuttered or mentioned repetition of syllables, the key symptom of stuttering.

None of the other three groups showed any significant shift in their average speech fluency either. Even looked at individually, none of the children showed any substantial shift in the direction predicted by the theory. Thus, Ambrose and Yairi’s analysis showed that Dyer’s central claim – that the treatment caused the normally speaking children to stutter – was wrong. Nor, apparently, had the stuttering’ children been caused to stop stuttering by being labelled as normal speakers.

One might think on this basis that the Tudor study was actually a refutation of Johnson’s theory rather than a confirmation, since changing the children’s labels had no effect on their propensity to stutter. But no; it was worse than that, according to Ambrose and Yairi. They reported that the study was so poorly designed and executed that it could not have been expected to reveal anything about the theory, regardless of whether the theory was right or wrong.

Most crucially, Ambrose and Yairi reported that many of the children had been assigned to the wrong subject groups. If they had been correctly assigned, the children in the NS and NN groups should have been given scores near the 5 (fluent) end of the scale, and the children in the SN and SS groups should have been given scores near the 1 (disfluent) end of the scale. In fact, however, there were no significant differences between the average scores of any of the groups before the treatment began. There were several children who were clearly described as stuttering or repeating syllables who were put in one of the ‘N’ groups, and several children who were described as not stuttering who were put in one of the ‘S’ groups. Apparently, the ‘stuttering’ children were selected simply because the orphanage staff said that they stuttered, and the ‘normal-speaking’ children were selected simply because the staff said that they didn’t stutter, and even though these assignments were not always borne out by the judges’ assessments, the children were left in the groups they were assigned to.

Thanks to an interlibrary loan, I was finally able to lay hands on Tudor’s thesis myself, and I confirmed the truth of what Ambrose and Yairi said on these points. However, this problem is not as devastating to the credibility of the study as Ambrose and Yairi implied. For one thing, many of the items used for fluency scoring were not criteria used in the diagnosis of stuttering, and for these unrelated items there was no reason to expect that stuttering children should score differently from non-stuttering children. Also, Tudor was concerned with the effects of changing labels: thus in selecting children for the study what mattered most was how a child was labelled prior to the study, not whether he or she actually stuttered or not. In this sense, Tudor had good reason to depend on the judgments of the orphanage staff, who had been in contact with the children for years.

There were other reasons, however, why no solid conclusions could be drawn from the study. I already mentioned the fact that the children were too old to test the diagnosogenic theory if children’s susceptibility to criticism was limited to a developmental period around the age when children typically begin to stutter. Also, the children formed an unrepresentative sample in many respects, such as being institutionalised and also in most cases having below-average IQs. Furthermore, there were too few children in each group for it to be likely that significant effects of treatment would emerge.

Finally, the indoctrination of the children was done ineffectually. As already mentioned, the staff didn’t cooperate, leaving Tudor’s visits as almost the only ‘relabelling’ that the children experienced. It is hard to believe that just a few sessions with Tudor would somehow outweigh a lifetime of being exposed to the opposite labels. Tudor herself commented on this in the ‘Discussion’ section of her thesis: ‘As it was,’ she wrote, ‘the children received their stimulation almost entirely from the writer. If these children had been constantly reminded of their speech they would have undoubtedly reacted more positively [ie, by showing more signs of stuttering].’ And she predicted that ‘more extensive results’ could be expected if the experiment had been done in a ‘home situation’ with constant critiques from the children’s parents.

Although, in the ‘Results’ section, Tudor reported the onset of ‘stuttering phenomena’ during the treatment of some of the NS children, such as Clarence Fifer, she did not mention any induction of stuttering in her Discussion. This was presumably because neither the judges’ assessments nor her own numerical analyses of the children’s speech documented such an effect. But she did conclude that these children were affected in other ways:

All of the subjects in Group IIA [NS] showed similar types of speech behavior during the experimental period. A decrease in verbal output was characteristic of all six subjects; that is, they were reluctant to speak and spoke only when they were urged to. Second, their rate of speaking was decreased. They spoke more slowly and with greater exactness. They had a tendency to weigh each word before they said it. Third, the length of response was shortened. The two younger subjects responded with only one word whenever possible. Fourth, they all became more self-conscious. They appeared shy and embarrassed in many situations. Fifth, they accepted the fact that there was something definitely wrong with their speech. Sixth, every subject reacted to his speech interruptions in some manner. Some hung their heads, others gasped and covered their mouths with their hands; others laughed with embarrassment. In every case, the children’s behavior changed noticeably.

This description is similar to Dyer’s assertion that the children in the NS group became withdrawn, isolated and reluctant to speak. The notion that Tudor perceived that the children suffered this kind of harm is bolstered by the fact that, after completion of the study, she returned to the orphanage three times to ‘debrief’ the children in the NS group – that is, to reassure them that they were in fact normal speakers and to encourage them to speak freely.

After the first of these visits, in March 1940, she wrote to Johnson as follows: ‘I didn’t find them as free from the effects of the therapy I had inflicted upon them last year as I had hoped to. But as I am still a firm believer in the theory of evaluative labelling, I wasn’t too disappointed.’ This passage (quoted by Dyer) confirms that Tudor perceived that the children suffered some psychological harm from the study, and that this harm lasted at least through to the following year. In addition, it implies that she saw this as confirmation of Johnson’s theory. Either she was suggesting that the children were, in fact, stuttering, or she thought that any kind of difficulty in speaking, even a reluctance to speak motivated by low self-esteem, would count as a confirmation.

In their analysis, Ambrose and Yairi took a cautious view of the idea that the NS children suffered lasting harm. ‘There did seem to be an effect that when children were told “You’d better watch out what you say” or “You’d better not repeat”, they got anxious about talking,’ Ambrose told me. But she pointed out that there was no objective assessment of such effects in the study. In addition, she tended to pooh-pooh the notion that the children had been turned into asocial hermits by the experience, as some of the elderly survivors apparently claimed in their discussions with Dyer. ‘These were children who were given up [by their parents], children who were living in a situation way less than ideal,’ Ambrose said. ‘These individuals had so many difficulties in their lives, I don’t see why you would think that that particular study of a few months, with very little contact really, had any kind of significant effect on their lives.’ She mentioned an episode recounted by Dyer, in which Mary Korlaske ran away from the orphanage and made it all the way back to her mother’s home, whereupon her mother summoned the police to take Mary back to the orphanage. ‘I would think that would be a much more problematic situation than someone telling you that you should speak fluently,’ Ambrose commented.

Given that Ambrose and Yairi are members of the academic speech-pathology community, it may be that they were motivated to minimise the harm suffered by children in Tudor’s experiment as a way of protecting Johnson’s reputation. Nevertheless, they minced no words in their critique of the Tudor study’s scientific merits. All in all, it seems very difficult at this point to discern what role, if any, the Johnson-Tudor study played in the difficulties that some of the NS children experienced in their later lives.

Tudor’s thesis was never published, and Johnson never referred to it in his own published writings. The only reference he made to the study was in his lectures to students, in which he cited it in support of the diagnosogenic theory, as mentioned by Oliver Bloodstein. According to Jim Dyer, this failure to publicise the study amounted to a positive ‘hiding’ of it. Dyer maintained that the reason had to do with the atrocities perpetrated by Nazi doctors in the name of medical science during World War II. After those crimes came to light, the entire scientific community was sensitised to ethical issues in research involving human subjects, and in this new environment Tudor’s experiments would have seemed ethically indefensible.

Ehud Yairi took exception to Dyer’s imputation that Johnson actively hid the Tudor study. Yairi stressed that the thesis has been available in the University of Iowa library and that at least 19 people are recorded as having read it between 1941 and 1993. Yet shelving an unpublished master’s thesis in a university library is equivalent, in the majority of cases, to destroying it. Such theses are rarely read and almost never cited.

Given that the study, on its face, was actually a disconfirmation of his theory, Johnson’s failure to publish it or refer to it in print could be viewed in quite a different light – that is, as part of a strategy to protect his pet theory from findings that put it in doubt. If so, his and Tudor’s failure to publish would be ethically troubling. It is equally possible, however, that Johnson realised that the study had grave methodological shortcomings that would not have passed scrutiny. ‘In light of today’s standards, I would say this is totally not publishable because it’s a very poorly done study,’ commented Ambrose.

Although Tudor’s thesis languished on the library stacks, and Tudor herself left the University of Iowa to take up a position as a speech therapist, there was some institutional memory of her experiment. Some of Johnson’s students began referring to it as the Monster Study. As far as I know, the first use of this phrase in print was in 1988, when the late Franklin Silverman, a professor of speech pathology at Marquette University, used it as the title of an article about the study.

During the postwar years, Wendell Johnson gradually built up his diagnosogenic theory into the centrepiece of his thinking about how stuttering developed and how it should be prevented or cured. It was but one of many ‘blame the parents’ theories that were current in that epoch: schizophrenia, autism, homosexuality and many other traits were said by influential academics and doctors to be caused by the way parents and others treated young children.

Although Johnson held fast to his theory until his death, other lines of research gradually made it seem less and less plausible. For one thing, studies of young children – some of them done by Nicoline Ambrose – revealed that the speech mistakes made by children at the very onset of stuttering are not the normal disfluencies exhibited by other children, but distinct abnormalities that are immediately recognisable as the beginning of stuttering. A child can be talking entirely normally, Ambrose told me, and the next day he or she wakes up obviously stuttering. There is increasing evidence that there is a genetic predisposition to stuttering. The efficacy of some drugs in relieving stuttering also points toward a biological explanation and away from a theory based on family dynamics. At this point, Johnson’s diagnosogenic theory is dead, and most of the current research interest is in pinning down the brain miswiring that was hypothesised by Johnson’s teachers in the 1930s.

That is not to say that Johnson’s contributions were worthless. He made useful studies of the ‘stuttering block’ – the mental events that lead to the moment when the person who stutters does actually stutter. And Johnson is still considered to have helped people  overcome or reduce their stuttering by encouraging them to see the trait as something they could control – an echo of the ‘Be the master of your fate and the captain of your soul’ mantra that he first heard as a teenager.

Ambrose emphasised this positive aspect of Johnson’s work in her discussion with me, but she also added a warning that it sometimes leads to negative consequences, especially a tendency to blame the person who stutters for his or her failure to improve. ‘[Johnson’s] idea that if stuttering can be learned it can be unlearned has done a disservice in some ways,’ she said, ‘because if you keep on stuttering, what’s wrong? And we don’t think that’s a failure on the part of the clinician or the client, if they’re not able to reduce their stuttering.’

During all the years in which Johnson’s diagnosogenic theory gained credence and then gradually lost it, the Tudor study remained largely unknown. Tudor herself, like the subjects of her study, led an inconspicuous life. There were occasional written references to the study, such as the 1988 article by Franklin Silverman, mentioned earlier. And in 1999, Jerome Halvorson – a onetime professor of speech pathology at the University of Wisconsin who had obtained a master’s degree at the University of Iowa – wrote a novel about the study, titled Abandoned: Now Stutter My Orphan. Halvorson used pseudonyms for the students rather than divulging their real names. Because the novel was self-published and highly idiosyncratic in style, it attracted little attention.

It was Jim Dyer’s articles in the San Jose Mercury News in 2001 that first drew wide attention to the Tudor study. Dyer’s articles contained what purported to be accurate details of the study, interlaced with sentimental accounts of a human drama that was largely orchestrated by Dyer himself.

A brief account of how this happened was given to the Des Moines Register by David Yarnold, executive editor of the Mercury News. Yarnold said that Dyer, besides his job at the Mercury News, was also a graduate student at the University of Iowa. Using his identity as a graduate student, Dyer gained access to the Iowa State Archives in Des Moines – specifically to confidential records that are only open to academics for bona fide research purposes. From these records, Yarnold said, Dyer obtained the real names of the children in Tudor’s study. (However, some of these names were also mentioned in Tudor’s notes, according to Dyer.) Armed with the names, Dyer tracked down and interviewed several of the surviving children – now elderly adults. He told them about the real purpose of Tudor’s study, of which the children had apparently never been informed, even in the course of Tudor’s ‘debriefing’ sessions after the study was completed. (Again, I haven’t been able to confirm this account with Dyer himself.)

Several of the subjects reacted to the information with understandable anger. Dyer describes the lives of some of them as having followed a downward spiral, starting with the stuttering that was allegedly caused by the study and progressing to near-complete social isolation in some cases. The centrepiece of Dyer’s story was Mary Korlaske, now a widow and a reclusive inhabitant of a retirement home, who was so incensed by what she heard from Dyer that she wrote an angry letter to Tudor – whether spontaneously or at Dyer’s suggestion, I don’t know. The letter, which was addressed to ‘Mary Tudor the Monster’ concluded as follows:

As I sit here crying… I wondered what I could say or send you to remind you of the hurtful pain that never goes away.

I’m sending you your own thimble.

God try to have mercy, or should he? You had no mercy for the children who still cry in the night.

- Mary Korlaske Nixon Case No.15

PS When the tears get realy (sic) bad, punch a whole (sic) in the bottom of the thimble like I did. Then the thimble won’t over flow.

Dyer hand-carried the letter to the 84-year-old Tudor – or else, he just happened to be present when it arrived in the mail – and he described Tudor’s reactions as she looked it over: the shaking of her head, the trembling of her hands and her comment, ‘Oh dear – I hope it isn’t a bomb.’

According to Dyer, Tudor herself made both positive and negative comments about the study. On the one hand. she was proud of a study that – as she still believed – proved Johnson’s diagnosogenic theory correct. ‘It was a small price to pay for science,’ Dyer quoted her as saying. ‘Look at the countless number of children it helped.’ On the other hand, she expressed shame at the apparent harm the study had done to some of the children. ‘That was the pitiful part – that I got them to trust me, and then I did this horrible thing to them,’ she said. Tudor deflected much of the blame onto Johnson who, she said, told her to perform the study in the fashion that she did, and neglected to have psychotherapists help the children to recover from the trauma that had been inflicted on them. I had hoped to hear more about this from Tudor herself, but when I tried calling her in the autumn of 2006 a neighbour informed me that she had died a few weeks earlier.

Dyer resigned from the Mercury News after the ruse he had allegedly used to discover the names of Tudor’s subjects came to light, and since that time he seems to have disappeared from public view.

In a lawsuit brought against the State of Iowa in 2003, the three surviving subjects in the NS group, along with the estates of the three who had died, requested $13 million in damages for intentional infliction of emotional distress, fraudulent misrepresentation, breach of fiduciary duty, invasion of privacy and civil conspiracy. In August of 2007 the plaintiffs settled with the State of Iowa in return for a total payment of $925,000.

As suggested earlier, it may be very difficult to tease apart the harm caused by the Tudor study from that caused by the many other traumas suffered by the orphanage children both before and after they participated in the study. But this uncertainty has not prevented many people from passing judgment on the ethical issues surrounding the case.

One person who has reviewed these issues in particular detail is Richard Schwartz, Presidential Professor in Speech and Hearing Sciences at the Graduate Center of the City University of New York. Schwartz has for many years chaired CUNY’s Institutional Review Board. In 2005, he published an analysis of the Tudor study with this question in mind: Would the study be approved if it were proposed to an IRB today? Schwartz believes that there are several reasons why it would not. First, there was the real possibility for harm to the children who were to be labelled as stutterers. ‘If you really believe the theory, you’re going to turn these children into stutterers,’ he told me in a 2006 interview. In addition, there was the possibility for more general psychological harm, as the lawsuit alleges did occur. These potential harms were not balanced by any potential benefits to the children.

Second, Schwartz says that the study would be judged unethical on account of its poor design and execution, which made it unlikely that it would add anything to the fund of knowledge about stuttering. If a study cannot generate useful findings it is unethical to engage human subjects in it.

Third, Schwartz believes that the study would be judged unethical because of its use of institutionalised children, who are considered to lack the same protections and capacity for choice that children living with their parents typically enjoy. In fact, current US federal regulations would rule out the Tudor study on these grounds alone.

Lastly, Schwartz believes that the use of deception in the Tudor study would be considered unethical today, because the scientific issues at the heart of the study could have been addressed by other means, and because the deception was not justified by any probability that real advances in scientific understanding or human welfare would result from its use.

Schwartz confessed to me that (like so many other people who have taken an interest in the case) he had not actually read the Tudor study but was dependent on information and extracts provided by others, especially by Ambrose and Yairi.

Most commentators have expressed criticisms of the Tudor study similar to those put forward by Schwartz. Ambrose and Yairi, for example, wrote that ‘It is unquestionable that the study was ethically wrong.’ But, one person has mounted a vigorous defence of Wendell Johnson – his son. Nicholas Johnson, a law professor at the University of Iowa, wrote an article in which he maintained that historical research should be judged only by the ethical standards of its own time. He then came up with a laundry list of equally or even more questionable studies from that general period, including the infamous Tuskegee syphilis study in which poor black men were denied access to treatment for their syphilis for many years. According to the younger Johnson, his father and his father’s student did nothing that was outside the bounds of normal research practice at the time.

Much of what Nicholas Johnson says is perfectly true: researchers did often take advantage of institutionalised persons for research in those days, sometimes inflicting worse harm on them than Mary Tudor probably did to her subjects. But the interest in revisiting ethical questions about a historical study such as Johnson’s is not – except perhaps for Johnson’s son and a bunch of lawyers – to pass retroactive moral judgment on the deceased persons who conceived it and carried it out. Rather, it is to highlight the reasons why it is necessary to have written regulations governing the use of human subjects in research today, as well as IRBs to enforce them.

Nicholas Johnson has also attempted to defend his father by pointing out that Tudor herself – who has largely escaped personal criticism – should share whatever blame is assigned for the study. Certainly graduate students need to accept responsibility for their actions, but in this particular case it is clear that the project was entirely Wendell Johnson’s idea. When I read Tudor’s thesis, I was struck by her near-total lack of interest in the issues that her project addressed; even though her Introduction briefly mentioned Johnson’s diagnosogenic theory as the inspiration for the study, her Discussion – also very brief – included no assessment of what her findings meant for the theory. In general, Tudor’s thesis reads like the work of an industrious low-level operative who followed her advisor’s instructions to the letter, and who considered her job complete when she had done so.

‘Whether there was true harm or not, [Johnson and Tudor’s] subjects were intruded on in a way that they shouldn’t have been,’ commented Schwartz by way of wrapping up the ethical issues. ‘They should have given this more thought, even given the mores of the time. Most importantly, it’s a useful thing today to teach both senior researchers like myself, and students, that you really have to think about these things. It’s very important, if anything, to err on the side of being cautious and more protective of human subjects and to be really good at perspective-taking: “What would this be like if this were my child, my relative, me, in the situation of being a subject; would this be OK?” And this is really at the heart of what an IRB tries to do.’