Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots - John Markoff (2015)
Chapter 4. THE RISE, FALL, AND RESURRECTION OF AI
Sitting among musty boxes in an archive at Stanford University in the fall of 2010, David Brock felt his heart stop. A detail-oriented historian specializing in the semiconductor industry, Brock was painstakingly poring over the papers of William Shockley for his research project on the life of Intel Corp. cofounder Gordon Moore. After leading the team that coinvented the transistor at Bell Labs, Shockley had moved back to Santa Clara County in 1955, founding a start-up company to make a new type of more manufacturable transistor. What had been lost, until Brock found it hidden among Shockley’s papers, was a bold proposal the scientist had made in an effort to persuade Bell Labs, in 1951 the nation’s premier scientific research institution, to build an “automatic trainable robot.”
For decades there have been heated debates about what led to the creation of Silicon Valley, and one of the breezier explanations is that Shockley, who had grown up near downtown Palo Alto, decided to return to the region that was once the nation’s fruit capital because his mother was then in ill health. He located Shockley Semiconductor Laboratory on San Antonio Road in Mountain View, just south of Palo Alto and across the freeway from where Google’s sprawling corporate campus is today. Moore was one of the first employees at the fledgling transistor company and would later become a member of the “traitorous eight,” the group of engineers who, because of Shockley’s tyrannical management style, defected from his start-up to start a competing firm. The defection is part of the Valley’s most sacred lore as an example of the intellectual and technical freedom that would make the region an entrepreneurial hotbed unlike anything the world had previously seen. Many have long believed that Shockley’s decision to locate his transistor company in Mountain View was the spark that ignited Silicon Valley. However, it is more interesting to ask what Shockley was trying to accomplish. He has long been viewed as an early entrepreneur, fatally flawed as a manager. Still, his entrepreneurial passion has served as a model for generations of technologists. But that was only part of the explanation.
Brock sat in the Stanford archives staring at a yellowing single-page proposal titled the “A.T.R. Project.” Shockley, true to his temper, didn’t mince words: “The importance of the project described below is probably greater than any previously considered by the Bell System,” he began. “The foundation of the largest industry ever to exist may well be built upon its development. It is possible that the progress achieved by industry in the next two or three decades will be directly dependent upon the vigor with which projects of this class are developed.” The purpose of the project was, bluntly, “the substitution of machines for men in production.” Robots were necessary because generalized automation systems lacked both the dexterity and the perception of human workers. “Such mechanization will achieve the ultimate conceivable economy on very long runs but will be impractical on short runs,” he wrote. Moreover, his original vision was not just about creating an “automatic factory,” but a trainable robot that could be “readily modified to perform any one of a wide variety of operations.” His machine would be composed of “hands,” “sensory organs,” “memory,” and a “brain.”1
Shockley’s inspiration for a humanlike factory robot was that assembly work often consists of a myriad of constantly changing unique motions performed by a skilled human worker, and that such a robot was the breakthrough needed to completely replace human labor. His insight was striking because it came at the very dawn of the computer age, before the impact of the technology had been grasped by most of the pioneering engineers. At the time it was only a half decade since ENIAC, the first general purpose digital computer, had been heralded in the popular press as a “giant brain,” and just two years after Norbert Wiener had written his landmark Cybernetics, announcing the opening of the Information Age.
Shockley’s initial insight presaged the course that automation would take decades later. For example, Kiva Systems, a warehouse automation system acquired in 2012 by Amazon for $775 million, had the insight that the most difficult functions to automate in the modern warehouse were ones that required human eyes and hands, like identifying and grasping objects. Without perception and dexterity, robotic systems are limited to the most repetitive jobs, and so Kiva took the obvious intermediate step and built mobile robots that carried items to stationary human workers. Once machine perception and robotic hands became better and cheaper, humans could disappear entirely.
Indeed, Amazon made an exception to its usual policy of secrecy and invited the press to tour one of its distribution facilities in Tracy, California, during the Christmas buying season in December of 2014. What those on the press tour did not see was the development of an experimental station inside the facility where a robot arm performed the “piece pick” operations—the work now reserved for humans. Amazon is experimenting with a Danish robot arm designed to do the remaining human tasks.
In the middle of the last century, while Shockley expressed no moral qualms about using trainable robots to displace humans, Wiener saw a potential calamity. Two years after writing Cybernetics he wrote The Human Use of Human Beings, an effort to assess the consequences of a world full of increasingly intelligent machines. Despite his reservations, Wiener had been instrumental in incubating what Brock describes as an “automation movement” during the 1950s.2 He traces the start of what would become a national obsession with automation to February 2, 1955, when Wiener and Gordon Brown, the chair of the MIT electrical engineering department, spoke to an evening panel in New York City attended by five hundred members of the MIT Alumni Association on the topic of “Automation: What is it?”
On the same night, on the other side of the country, electronics entrepreneur Arnold O. Beckman chaired a banquet honoring Shockley alongside Lee de Forest, inventor of the triode, a fundamental vacuum tube. At the event Beckman and Shockley discovered they were both “automation enthusiasts.”3 Beckman had already begun to refashion Beckman Instruments around automation in the chemical industries, and at the end of the evening Shockley agreed to send Beckman a copy of his newly issued patent for an electro-optical eye. That conversation led to Beckman funding Shockley Semiconductor Laboratory as a Beckman Instruments subsidiary, but passing on the opportunity to purchase Shockley’s robotic eye. Shockley had written his proposal to replace workers with robots amid the nation’s original debate over “automation,” a term popularized by John Diebold in his 1952 book Automation: The Advent of the Automatic Factory.
Shockley’s prescience was so striking that when Rodney Brooks, himself a pioneering roboticist at the Stanford Artificial Intelligence Laboratory in the 1970s, read Brock’s article in IEEE Spectrum in 2013, he passed Shockley’s original 1951 memo around his company, Rethink Robotics, and asked his employees to guess when the memo had been written. No one came close. That memo predates by more than a half century Rethink’s Baxter robot, introduced in the fall of 2012. Yet Baxter is almost exactly what Shockley proposed in the 1950s—a trainable robot with an expressive “face” on an LCD screen, “hands,” “sensory organs,” “memory,” and, of course, a “brain.”
The philosophical difference between Shockley and Brooks is that Brooks’s intent has been for Baxter to cooperate with human workers rather than replace them, taking over dull, repetitive tasks in a factory and leaving more creative work for humans. Shockley’s original memo demonstrates that Silicon Valley had its roots in the fundamental paradox that technology both augments and dispenses with humans. Today the paradox remains sharper than ever. Those who design the systems that increasingly reshape and define the Information Age are making choices to build humans in or out of the future.
Silicon Valley’s hidden history presages Google’s more recent “moon shot” effort to build mobile robots. During 2013 Google quietly acquired many of the world’s best roboticists in an effort to build a business claiming leadership in the next wave of automation. Like the secretive Google car project, the outlines of Google’s mobile robot business have remained murky. It is still unclear whether Google as a company will end up mostly augmenting or replacing humans, but today the company is dramatically echoing Shockley’s six-decade-old trainable robot ambition.
The dichotomy between AI and IA had been clear for many years to Andy Rubin, a robotics engineer who had worked for a wide range of Silicon Valley companies before coming to Google to build the company’s smartphone business in 2005. In 2013 Rubin had left his post as head of the company’s Android phone business and begun quietly acquiring some of the best robotics companies and technologists in the world. He found a new home for the business on California Ave., on the edge of Stanford Industrial Park just half a block away from the original Xerox PARC laboratory where the Alto, the first modern personal computer, was designed. Rubin’s building was unmarked, but an imposing statue of a robot in an upstairs atrium was visible from the street below. That is, until one night the stealthy roboticists received an unhappy call from the neighbor directly across the street. The eerie-looking robot was giving their young son nightmares. The robot was moved inside where it was no longer visible to the outside world.
Years earlier, Rubin, who was also a devoted robot hobbyist, had helped fund Stanford AI researcher Sebastian Thrun’s effort to build Stanley, the autonomous Volkswagen that would eventually win a $2 million DARPA prize for navigating unaided through more than a hundred miles of California desert. “Personal computers are growing legs and beginning to move around in the environment,” Rubin said in 2005.4 Since then there has been a growing wave of interest in robotics in Silicon Valley. Andy Rubin was simply an early adopter of Shockley’s original insight.
However, during the half decade after Shockley’s 1955 move to Palo Alto, the region became ground zero for social, political, and technological forces that would reshape American society along lines that to this day define the modern world. Palo Alto would be transformed from its roots as a sleepy college town into one of the world’s wealthiest communities. However, during the 1960s and 1970s, the Vietnam War, civil rights movement, and rise of the counterculture all commingled with the arrival of microprocessors, personal computing, and computer networking.5 In a handful of insular computer laboratories, hackers and engineers found shelter from a fractious world. By 1969, the year Richard Nixon was inaugurated president, Seymour Hersh reported the My Lai massacre, and astronauts Neil Armstrong and Buzz Aldrin walked on the moon. Americans had for the first time traveled to another world, but the nation was at the same time mired in disastrous foreign conflict.
The year 1968 had seen the premiere of the movie 2001: A Space Odyssey painting a stark view of both the potential and pitfalls of artificial intelligence. HAL—the computer that felt impelled to violate Asimov’s laws of robotics, a 1942 dictum forbidding machines to injure humans, even to ensure their survival—had defined the robot in popular culture. By the late 1960s, science-fiction writers were the nation’s technology seers and AI had become a promising new technology in the form of computing and robotics—playing out both in visions of technological paradise and as populist paranoia. The future seemed almost palpable in a nation that had literally gone from The Flintstones to The Jetsons between 1960 and 1963.
Amid this cultural turmoil Charlie Rosen began building the world’s first real robot as a platform for conducting artificial intelligence experiments. Rosen was a Canadian-born applied physicist who was thinking about a wide range of problems related to computing, including sensors, new kinds of semiconductors, and artificial intelligence. He was something of a renaissance man: he coauthored one of the early textbooks on transistors and had developed an early interest in neural nets—computer circuits that showed promise in recognizing patterns, “learning” by simulating behavior of biological neurons.
As a result, Stanford Research Institute became one of the two or three centers of research on neural nets and perceptrons, efforts to mimic human forms of biological learning. Rosen was a nonstop fount of ideas, continually challenging his engineers about the possibility of remarkably far-out experiments. Peter Hart, a young Stanford electrical engineer who had done research on simple pattern recognizers, remembered frequent encounters with Rosen. “Hey, Pete,” Rosen would say while pressing his face up to the young scientist’s, close enough that Hart could see Rosen’s quivering bushy eyebrows while Rosen poked his finger into Hart’s chest. “I’ve got an idea.” That idea might be an outlandish concept for recognizing speech, involving a system to capture spoken words in a shallow tank of water about three meters long, employing underwater audio speakers and a video camera to capture the standing wave pattern created by the sound.
After describing each new project, Rosen would stare at his young protégé and shout, “What are you scared of?” He was one of the early “rainmakers” at SRI, taking regular trips to Washington, D.C., to interest the Pentagon in funding projects. It was Rosen who was instrumental in persuading the military to fund Doug Engelbart for his original idea of augmenting humans with computers. Rosen also wrote and sold the proposal to develop a mobile “automaton” as a test bed for early neural networks and other AI programs. At one meeting with some Pentagon generals he was asked if this automaton could carry a gun. “How many do you need?” was his response. “I think it should easily be able to handle two or three.”
It took the researchers a while to come up with a name for the project. “We worked for a month trying to find a good name for it, ranging from Greek names to whatnot, and then one of us said, ‘Hey, it shakes like hell and moves around, let’s just call it Shakey,’”6 Hart recalled.
Eventually Rosen would become a major recipient of funding from the Defense Advanced Research Projects Agency at the Pentagon, but before that he stumbled across another source of funding, also inside the military. He managed to get an audience with one of the few prominent women in the Pentagon, mathematician Ruth Davis. When Rosen told her he wanted to build an intelligent machine, she exclaimed, “You mean it could be a sentry? Could you use it to replace a soldier?” Rosen confided that he didn’t think robot soldiers would be on the scene anytime soon, but he wanted to start testing prerequisite ideas about machine vision, planning, problem-solving, and understanding human language. Davis became enthused about the idea and was an early funder of the project.
Shakey was key because it was one of just a handful of major artificial intelligence projects that began in the 1960s, causing an explosion of early work in AI that would reverberate for decades. Today Shakey’s original DNA can be found in everything from the Kiva warehouse robot and Google’s autonomous car to Apple’s Siri intelligent assistant. Not only did it serve to train an early generation of researchers, but it would be their first point of engagement with technical and moral challenges that continue to frame the limits and potential of AI and robotics today.
Many people believed Shakey was a portent for the future of AI. In November 1970 Life magazine hyped the machine as something far more than it actually was. The story appeared alongside a cover story about a coed college dormitory, ads for a car with four-wheel drive, and a Sony eleven-inch television. Reporter Brad Darrach’s first-person account took great liberties with Shakey’s capabilities in an effort to engage with the coming-of-age complexities of the machine era. He quoted a researcher at the nearby Stanford Artificial Intelligence Laboratory as acknowledging that the field had so far not been able to endow machines with complex emotional reactions such as human orgasms, but the overall theme of the piece was a reflection of the optimism that was then widespread in the robotics community.
The SRI researchers, including Rosen and his lieutenants, Peter Hart and Bert Raphael, were dismayed by a description claiming that Shakey was able to roll freely through the research laboratory’s hallways at a faster-than-human-walking clip, pausing only to peer in doorways while it reasoned in a humanlike way about the world around it. According to Raphael, the description was particularly galling because the robot had not even been operational when Darrach visited. It had been taken down while it was being moved to a new control computer.7
Marvin Minsky, the MIT AI pioneer, was particularly galled and wrote a long rebuttal, accusing Darrach of fabricating quotes. Minsky was quoted saying that the human brain was just a computer made out of “meat.” However, he was most upset at being linked to an assertion: “In from three to eight years we will have a machine with the general intelligence of an average human being. I mean a machine that will be able to read Shakespeare, grease a car, play office politics, tell a joke, have a fight. At that point the machine will begin to educate itself with fantastic speed. In a few months it will be at genius level and a few months after that its powers will be incalculable.”8 In hindsight, Darrach’s alarms seem almost quaint today. Whether clueless or willfully deceptive, his broader point was simply that sooner or later—and he clearly wanted the reader to believe it was sooner—society would have to decide how they would live with their cybernetic offspring.
Indeed, despite his frustration with the inaccurate popularization of Shakey, two years later, in a paper presented at a technical computing and robotics conference in Boston, Rosen would echo Darrach’s underlying theme. He rejected the idea of a completely automated, “lights-out factory” in the “next generation” largely because of the social and economic chaos that would ensue. Instead, he predicted that by the end of the 1970s, the arrival of factory and service robots (under the supervision of humans) would eliminate repetitive tasks and drudgery. The arrival of the robots would be accompanied by a new wave of technological unemployment, he argued, and it was incumbent upon society to begin rethinking issues such as the length of the workweek, retirement age, and lifetime service.9
For more than five years, SRI researchers attempted to design a machine that was nominally an exercise in pure artificial intelligence. Beneath the veneer of science, however, the Pentagon was funding the project with the notion that it might one day lead to a military robot capable of tracking the enemy without risking lives of U.S. or allied soldiers. Shakey was not only the touchstone for much of modern AI research as well as projects leading to the modern augmentation community—it was also the original forerunner of the military drones that now patrol the skies over Afghanistan, Iraq, Syria, and elsewhere.
Shakey exemplified the westward migration of computing and early artificial intelligence research during the 1960s. Although Douglas Engelbart, whose project was just down the hall, was a West Coast native, many others were migrants. Artificial intelligence as a field of study was originally rooted in a 1956 Dartmouth College summer workshop where John McCarthy was a young mathematics professor. McCarthy had been born in 1927 in Boston of an Irish Catholic father and Lithuanian Jewish mother, both active members of the U.S. Communist Party. His parents were intensely intellectual and his mother committed to the idea that her children could pursue any interests they chose. At twelve McCarthy encountered Eric Temple Bell’s Men of Mathematics, a book that helped determine the career of many of the best and brightest of the era including scientists Freeman Dyson and Stanislaw Ulam. McCarthy was viewed as a high school math prodigy and only applied to Caltech, where Temple Bell was a professor, something he later decided had been an act of “arrogance.” On his application he described his plans in a single sentence: “I intend to be a professor of mathematics.” Bell’s book had given him a realistic view of what that path would entail. McCarthy had decided that mathematicians were rewarded principally by the quality of their research, and he was taken with the idea of the self-made intellectual.
At Caltech he was an ambitious student. He jumped straight to advanced calculus and simultaneously a range of other courses including aeronautical engineering. He was drafted relatively late in the war, so his army career was more about serving as a cog in the bureaucracy than combat. Stationed close to home at Fort MacArthur in the port city of San Pedro, California, he began as a clerk, preparing discharges, then promotions for soldiers leaving the military. He made his way to Princeton for graduate school and promptly paid a visit to John von Neumann, the applied mathematician and physicist who would become instrumental in defining the basic design of the modern computer.
At this point the notion of “artificial intelligence” was fermenting in McCarthy’s mind, but the coinage had not yet come to him. That wouldn’t happen for another half decade in conjunction with the summer 1956 Dartmouth conference. He had first come to the concept in grad school when attending the Hixon Symposium on Cerebral Mechanisms in Behavior at Caltech.10 At that point there weren’t programmable computers, but the idea was in the air. Alan Turing, for example, had written about the possibility the previous year, to receptive audiences on both sides of the Atlantic. McCarthy was thinking about intelligence as a mathematical abstraction rather than something realizable—along the lines of Turing—through building an actual machine. It was an “automaton” notion of creating human intelligence, but not of the kind of software cellular automata that von Neumann would later pursue. McCarthy focused instead on an abstract notion of intelligence that was capable of interacting with the environment. When he told von Neumann about it, the scientist exclaimed, “Write it up!” McCarthy thought about the idea a lot but never published anything. Years later he would express regret at his inaction. Although his thesis at Princeton would focus on differential equations, he also developed an interest in logic, and a major contribution to the field of artificial intelligence would later come from his application of mathematical logic to common sense reasoning. He arrived at Princeton a year after Marvin Minsky and discovered that they were both already thinking about the idea of artificial intelligence. At the time, however, there were no computers to allow them to work with the ideas, and so the concept would remain an abstraction.
As a graduate student, McCarthy was a contemporary of John Forbes Nash, the mathematician and Nobel laureate who would later be celebrated in Sylvia Nasar’s 1998 biography, A Beautiful Mind. The Princeton graduate students made a habit of playing practical jokes on each other. McCarthy, for example, fell victim to a collapsing bed. He found that another graduate student was a double agent in their games, plotting with McCarthy against Nash while at the same time plotting with Nash against McCarthy. Game theory was in fashion at the time and Nash later received his Nobel Prize in economics for contributions to that field.
During the summer of 1952 both McCarthy and Minsky were hired as research assistants by mathematician and electrical engineer Claude Shannon at Bell Labs. Shannon, known as the father of “information theory,” had created a simple chess-playing machine in 1950, and there was early interest in biological-growth simulating programs known as “automata,” of which John Conway’s 1970 Game of Life would become the most famous.
Minsky was largely distracted by his impending wedding, but McCarthy made the most of his time at Bell Labs, working with Shannon on a collection of mathematical papers that was named at Shannon’s insistence Automata Studies.11 Using the word “automata” was a source of frustration for McCarthy because it shifted the focus of the submitted papers away from the more concrete artificial intelligence ideas and toward more esoteric mathematics.
Four years later he settled the issue when he launched the new field that now, six decades later, is transforming the world. He backed the term “artificial intelligence” as a means of “nail[ing] the idea to the mast”12 and focusing the Dartmouth summer project. One unintended consequence was that the term implied the idea of replacing the human mind with a machine, and that would contribute to the split between the artificial intelligence and intelligence augmentation researchers. The christening of the field, however, happened in 1956 during the event that McCarthy was instrumental in organizing: the Dartmouth Summer Research Project on Artificial Intelligence, which was underwritten with funding from the Rockefeller Foundation. As a branding exercise it would prove a momentous event. Other candidate names for the new discipline included cybernetics, automata studies, complex information processing, and machine intelligence.13
McCarthy wanted to avoid the term “cybernetics” because he thought of Norbert Wiener, who had coined the term, as something of a bombastic bore and he chose to avoid arguing with him. He also wanted to avoid the term “automata” because it seemed remote from the subject of intelligence. There was still another dimension inherent in the choice of the term “artificial intelligence.” Many years later in a book review taking issue with the academic concept known as the “social construction of technology,” McCarthy took pains to distance artificial intelligence from its human-centered roots. It wasn’t about human behavior, he insisted.14
The Dartmouth conference proposal, he would recall years later, had made no reference to the study of human behavior, “because [he] didn’t consider it relevant.”15 Artificial intelligence, he argued, was not considered human behavior except as a possible hint about performing humanlike tasks. The only Dartmouth participants who focused on the study of human behavior were Allen Newell and Herbert Simon, the Carnegie Institute researchers who had already won acclaim for ingeniously bridging the social and cognitive sciences. Years later the approach propounded by the original Dartmouth conference members would become identified with the acronym GOFAI, or “Good Old-Fashioned Artificial Intelligence,” an original approach centered on achieving human-level intelligence through logic and the branch of problem-solving rules called heuristics.
IBM, by the 1950s already the world’s largest computer maker, had initially been involved in the planning for the summer conference. Both McCarthy and Minsky had spent the summer of 1955 in the IBM laboratory that had developed the IBM 701, a vacuum tube mainframe computer of which only nineteen were made. In the wake of the conference, several IBM researchers did important early work on artificial intelligence research, but in 1959 the computer maker pulled the plug on its AI work. There is evidence that the giant computer maker was fearful that its machines would be linked to technologies that destroyed jobs.16 At the time the company chief executive Thomas J. Watson Jr. was involved in national policy discussions over the role and consequences of computers in automation and did not want his company to be associated with the wholesale destruction of jobs. McCarthy would later call the act “a fit of stupidity” and a “coup.”17
During those early years McCarthy and Minsky remained largely inseparable—Minsky’s future wife even brought McCarthy along when she took Minsky home to introduce him to see her parents—even though their ideas about how to pursue AI increasingly diverged. Minsky’s graduate studies had been on the creation of neural nets. As his work progressed, Minsky would increasingly place the roots of intelligence in human experience. In contrast, McCarthy looked throughout his career for formal mathematical-logical ways to model the human mind.
Yet despite their initial difficulties, early on, the field remained remarkably collegial and in the hands of researchers with privileged access to the jealously guarded room-sized computers of the era. As McCarthy recalls it, the MIT Artificial Intelligence Laboratory came into being in 1958 after both he and Minsky had joined the university faculty. One day McCarthy met Minsky in a hallway and said to him, “I think we should have an AI project.” Minsky responded that he thought that was a good idea. Just then the two men saw Jerome Wiesner, then head of the Research Laboratory on Electronics, walking toward them.
McCarthy piped up, “Marvin and I want to have an AI project.”
“What do you want?” Wiesner responded.
Thinking quickly on his feet, McCarthy said, “We’d like a room, a secretary, a keypunch, and two programmers.”
To which Wiesner replied, “And how about six graduate students?”
Their timing would prove to be perfect. MIT had just received a large government grant “to be excellent,” but no one really knew what “excellent” meant. The grant supported six mathematics graduate students at the time, but Wiesner had no idea what they would do. So for Wiesner, McCarthy and Minsky were a serendipitous solution.18
The funding grant came through in the spring of 1958, immediately in the wake of the Soviet Sputnik satellite. U.S. federal research dollars were just starting to flow in large amounts to universities. It was widely believed that the generous support of science would pay off for the U.S. military, and that year President Eisenhower formed the Advanced Research Projects Agency, ARPA, to guard against future technological surprises.
The fortuitous encounter by the three men had an almost unfathomable impact on the world. A number of the “six graduate students” were connected with the MIT Model Railway Club, an unorthodox group of future engineers drawn to computing as if by a magnet. Their club ethos would lead directly to what became the “hacker culture,” which held as its most prized value the free sharing of information.19 McCarthy would help spread the hacker ethic when he left MIT in 1962 and set up a rival laboratory at Stanford University. Ultimately the original hacker culture would also foment social movements such as free/open-source software, Creative Commons, and Network Neutrality movements. While still at MIT, McCarthy, in his quest for a more efficient way to conduct artificial intelligence research, had invented computer time-sharing, as well as the Lisp programming language. He had an early notion that his AI, when it was perfected, would be interactive and logical to design on a computing system shared by multiple users, rather than requiring users to sign up to use the computer one at a time.
When MIT decided to do a survey on the wisdom of building a time-sharing system instead of immediately building what McCarthy had proposed, he decided to head west. Asking university faculty and staff what they thought of computer time-sharing would be like surveying ditchdiggers about the value of a steam shovel, he would later grouse.20
He was thoroughly converted to the West Coast counterculture. Although he had long since left the Communist Party, he was still on the Left and would soon be attracted to the anti-establishment community around Stanford University. He took to wearing a headband to pair with his long hair and became an active participant in the Free University that sprang up on the Midpeninsula around Stanford. Only when Russia crushed the Czech uprising in 1968 did he experience his final disillusionment with socialism. Not long afterward, while arguing over the wisdom of nonviolence during a Free U meeting, one of the radicals threatened to kill McCarthy, and he consequently ricocheted permanently to the Right. Not long afterward he registered as a Republican.
At the same time his career blossomed. Being a Stanford professor was a hunting license for funding and on his way to Stanford he turned to his friend J. C. R. Licklider, a former MIT psychologist, who headed ARPA’s Information Processing Techniques Office beginning in 1962. Licklider had collaborated with McCarthy on an early paper on time-sharing and he funded an ambitious time-sharing program at MIT after McCarthy moved to Stanford. McCarthy would later say that he never would have left if he had known that Licklider would be pushing time-sharing ideas so heavily.
On the West Coast, McCarthy found few bureaucratic barriers and quickly built an artificial intelligence lab at Stanford to rival the one at MIT. He was able to secure a computer from Digital Equipment Corporation and found space in the hills behind campus in the D.C. Power Laboratory, in a building and on land donated to Stanford by GTE after the telco canceled a plan for a research lab on the West Coast.
The Stanford Artificial Intelligence Laboratory quickly became a California haven for the same hacker sensibility that had spawned at MIT. Smart young computer hackers like Steve “Slug” Russell and Whitfield Diffie followed McCarthy west, and during the next decade and a half a startling array of hardware engineers and software designers would flow through the laboratory, which maintained its countercultural vibe even as McCarthy became politically more conservative. Both Steve Jobs and Steve Wozniak would hold on to sentimental memories of their visits as teenagers to the Stanford laboratory in the hills. SAIL would become a prism through which a stunning group of young technologists as well as full-blown industries would emerge.
Early work in machine vision and robotics began at SAIL, and the laboratory was indisputably the birthplace of speech recognition. McCarthy gave Raj Reddy his thesis topic on speech understanding, and Reddy went on to become the seminal researcher in the field. Mobile robots, paralleling Shakey at Stanford Research Institute, would be pursued at SAIL by researchers like Hans Moravec and later Rodney Brooks, both of whom became pioneering robotics researchers at Carnegie Mellon and MIT, respectively.
It proved to be the first golden era of AI, with research on natural language understanding, computer music, expert systems, and video games like Spacewar. Kenneth Colby, a psychiatrist, even worked on a refined version of Eliza, the online conversation system originally developed by Joseph Weizenbaum at MIT. Colby’s simulated person was known as “Parry,” with an obliquely bent paranoid personality. Reddy, who had previous computing experience using an early IBM mainframe called the 650, remembered that the company had charged $1,000 an hour for access to the machine. Now he found he “owned” a computer that was a hundred times faster for half of each day—from eight o’clock in the evening until eight the next morning. “I thought I had died and gone to heaven,” he said.21
McCarthy’s laboratory spawned an array of subfields, and one of the most powerful early on was known as knowledge engineering, pioneered by computer scientist Ed Feigenbaum. Begun in 1965, his first project, Dendral, was a highly influential early effort in the area of software expert systems intended to capture and organize human knowledge, and was initially intended to help chemists identify unknown organic molecules. It was a cooperative project among computer scientists Feigenbaum and Bruce Buchanan and two superstars from other academic fields—Joshua Lederberg, a molecular biologist, and Carl Djerassi, a chemist known for inventing the birth control pill—to automate the problem-solving strategies of an expert human organic chemist.
Buchanan would recall that Lederberg had a NASA contract related to the possibility of life on Mars and that mass spectrometry would be an essential tool in looking for such life: “That was, in fact, the whole Dendral project laid out with a very specific application, namely, to go to Mars, scoop up samples, look for evidence of organic compounds,”22 recalled Buchanan. Indeed, the Dendral project began in 1965 in the wake of a bitter debate within NASA over what the role of humans would be in the moon mission. Whether to keep a human in the control loop was sharply debated inside the agency at the dawn of spaceflight, and is again today, decades later, concerning a manned mission to Mars.
The original AI optimism that blossomed at SAIL would hold sway throughout the sixties. It is now lost in history, but Moravec, who as a graduate student lived in SAIL’s attic, recalled years later that when McCarthy first set out the original proposal he told ARPA that it would be possible to build “a fully intelligent machine” in the space of a decade.23 From the distance of more than a half century, it seems both quixotic and endearingly naive, but from his initial curiosity in the late 1940s, before there were computers, McCarthy had defined the goal of creating machines that matched human capabilities.
Indeed, during the first decade of the field, AI optimism was immense, as was obvious from the 1956 Dartmouth workshop:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.24
Not long afterward Minsky would echo McCarthy’s optimism, turning a lone graduate student loose on the problem of machine vision, figuring that it was a suitable problem to be solved as a summer project.25 “Our ultimate objective is to make programs that learn from their experience as effectively as humans do,” McCarthy wrote.26
As part of that effort he created a laboratory that was a paradise for researchers who wanted to mimic humans in machine form. At the same time it would also create a cultural chasm that resulted in a computing world with two separate research communities—those who worked to replace the human and those who wanted to use the same technologies to augment the human mind. As a consequence, for the past half century an underlying tension between artificial intelligence and intelligence augmentation—AI versus IA—has been at the heart of progress in computing science as the field has produced a series of ever more powerful technologies that are transforming the world.
It is easy to argue that AI and IA are simply two sides of the same coin. There is a fundamental distinction, however, between approaches to designing technology to benefit humans and designing technology as an end in itself. Today, that distinction is expressed in whether increasingly capable computers, software, and robots are designed to assist human users or to replace them. Early on some of the researchers who passed through SAIL rebelled against McCarthy-style AI. Alan Kay, who pioneered the concept of the modern personal computer at Xerox during the 1970s, spent a year at SAIL, and would later say it was one of the least productive years of his career. He already had fashioned his Dynabook idea—“a personal computer for children of all ages”27—that would serve as the spark for a generation of computing, but he remained an outsider in the SAIL hacker culture. For others at SAIL, however, the vision was clear: machines would soon match and even replace humans. They were the coolest things around and in the future they would meet and then exceed the capabilities of their human designers.
You must drive several miles from the Carnegie Mellon University campus to reach a pleasantly obscure Pittsburgh residential neighborhood to find Hans Moravec. His office is tucked away in a tiny apartment at the top of a flight of stairs around the corner from a small shopping street. Inside, Moravec, who retains his childhood Austrian accent, has converted a two-room apartment into a hideaway office where he can concentrate without interruption. The apartment opens into a cramped sitting room housing a small refrigerator. At the back is an even smaller office, with curtains down, dominated by large computer displays.
Several decades ago, when he captured the public’s attention as one of the world’s best-known robot designers, magazines often described him as “robotic.” In person, he is anything but, breaking out in laughter frequently and with a self-deprecating sense of humor. Still an adjunct professor at the Robotics Institute at Carnegie Mellon, where he taught for many years, Moravec, one of John McCarthy’s best-known graduate students, has largely vanished from the world he helped create.
When Robert M. Geraci, a religious studies professor at Manhattan College and author of Apocalyptic AI: Visions of Heaven in Robotics, Artificial Intelligence, and Virtual Reality (2010), came to Pittsburgh to conduct his research several years ago, Moravec politely declined to see him, citing his work on a recent start-up. Geraci is one of a number of authors who have painted Moravec as the intellectual cofounder, with Ray Kurzweil, of a techno-religious movement that argues that humanity will inevitably be subsumed as a species by the AIs and robots we are now creating. In 2014 this movement gained generous exposure as high-profile technological and scientific luminaries such as Elon Musk and Stephen Hawking issued tersely worded warnings about the potential threat that futuristic AI systems hold for the human species.
Geraci’s argument is that there is a generation of computer technologists who, in looking forward to the consequences of their inventions, have not escaped Western society’s religious roots but rather recapitulated them. “Ultimately, the promises of Apocalyptic AI are almost identical to those of Jewish and Christian apocalyptic traditions. Should they come true, the world will be, once again, a place of magic,”28 Geraci wrote. For the professor of religion, the movement could in fact be reduced to the concept of alienation, which in his framing is mainly about the overriding human fear of dying.
Geraci’s conception of alienation isn’t simply a 1950s James Dean-like disconnect from society. Yet it is just as hard to pin Moravec on the more abstract concept of fear of death. The robotics pioneer became legendary for taking up residence in the attic of McCarthy’s SAIL lab during the 1970s, when it was a perfect counterculture world for the first generation of computer hackers who discovered that the machines they had privileged access to could be used as “fantasy amplifiers.”
During the 1970s, McCarthy continued to believe that artificial intelligence was within reach even with the meager computing resources then at hand, famously noting that a working AI would require: “1.8 Einsteins and one-tenth the resources of the Manhattan Project.”29 In contrast, Moravec’s perspective was rooted in the rapidly accelerating evolution of computing technology. He quickly grasped the implications of Moore’s law—the assertion that over time computing power would increase exponentially—and extended that observation to what he believed would be the logical conclusion: machine intelligence was inevitable and moreover it would happen relatively soon. He summed up the obstacles faced by the AI field in the late 1970s succinctly:
The most difficult tasks to automate, for which computer performance to date has been most disappointing, are those that humans do most naturally, such as seeing, hearing and common sense reasoning. A major reason for the difficulty has become very clear to me in the course of my work on computer vision. It is simply that the machines with which we are working are still a hundred thousand to a million times too slow to match the performance of human nervous systems in those functions for which humans are specially wired. This enormous discrepancy is distorting our work, creating problems where there are none, making others impossibly difficult, and generally causing effort to be misdirected.30
He first outlined his disagreement with McCarthy in 1975 in the SAIL report “The Role of Raw Power in Intelligence.”31 It was a powerful manifesto that steeled his faith in the exponential increase in processing power and simultaneously convinced him that the current limits were merely a temporary state of affairs. The lesson he drew early on, and to which he would return throughout his career, was that if you were stymied as an AI designer, just wait a decade and your problems would be solved by the inexorable increase in computing performance. In a 1978 essay for the science-fiction magazine Analog, he laid out his argument for a wider public. Indeed in the Analogessay he still retained much of McCarthy’s original faith that machines would cross the level of human intelligence in about a decade: “Suppose my projections are correct, and the hardware requirements for human equivalence are available in 10 years for about the current price of a medium large computer,” he asked. “What then?”32 The answer was obvious. Humans would be “outclassed” by the new species we were helping to evolve.
After leaving Stanford in 1980, Moravec would go on to write two popular books sketching out the coming age of intelligent machines. Mind Children: The Future of Robot and Human Intelligence (1988) contains an early detailed argument that the robots that he has loved since childhood are in the process of evolving into an independent intelligent species. A decade later he refined the argument in Robot: Mere Machine to Transcendent Mind (1998).
Significantly, although it is not widely known, Doug Engelbart had made the same observation, that computers would increase in power exponentially, at the dawn of the interactive computing age in 1960.33 He used this insight to launch the SRI-based augmentation research project that would help lead ultimately to both personal computing and the Internet. In contrast, Moravec built on his lifelong romance with robots. Though he has tempered his optimism, his overall faith never wavered. During the 1990s, in addition to writing his second book, he took two sabbaticals in an effort to hurry the process of perfecting the ability to permit machines to see and understand their environments so they could navigate and move freely.
The first sabbatical he spent in Cambridge, Massachusetts, at Danny Hillis’s Thinking Machines Corporation, where Moravec hoped to take advantage of a supercomputer. But the new supercomputer, the CM-5, wasn’t ready. So he contented himself with refining his code on a workstation while waiting for the machine. By the end of his stay, he realized that he only needed to wait for the power of a supercomputer to come to his desktop rather than struggle to restructure his code so it would run on a special-purpose machine. A half decade later, on a second sabbatical at a Mercedes-Benz research lab in Berlin, he again had the same realization.
Moravec still wasn’t quite willing to give up and so after coming back from Germany he took a DARPA contract to continue work on autonomous mobile robotic software. But after writing two best-selling books over a decade arguing for a technological promised land, he decided it was really time to settle down and do something about it. The idea that the exponential increase of computing power would inevitably lead to artificially intelligent machines was becoming more deeply ingrained in Silicon Valley, and a slick packaging of the underlying argument was delivered in 2005 by Ray Kurzweil’s The Singularity Is Near. “It was becoming a spectacle and it was interfering with real work,” he decided. By now he had taken to heart Alan Kay’s dictum that “the best way to predict the future is to invent it.”
His computer cave is miles from the offices of Seegrid, the robotic forklift company he founded in 2003, but within walking distance of his Pittsburgh home. For the past decade he has given up his role as futurist and became a hermit. In a way, it is the continuation of the project he originally began as a child. Growing up in Canada, at age ten Moravec had built his first robot from tin cans, batteries, lights, and a motor. Later, in high school, he went on to build a robotic turtle capable of following a light and a robotic hand. At Stanford, he became the force behind the Stanford Cart project, a mobile robot with a TV camera that could negotiate obstacle courses. He had inherited the Cart system when he arrived at Stanford in 1971 and then gradually rebuilt the entire system.
Shakey was the first autonomous robot, but the Stanford Cart, with a long and colorful history of its own, is the true predecessor of the autonomous car. It had first come to life as a NASA-funded project in the mechanical engineering department in 1960, based on the idea that someday a vehicle would be remotely driven on the surface of the moon. The challenge was how to control such a vehicle given the 2.7-second propagation delay defining the round-trip radio signal between the Earth and the moon.
Funding for the initial project was rejected because the logic of keeping a human in the loop had won out. When in 1962 President Kennedy committed the nation to the manned exploration of the moon, the original Cart was shelved34 as unnecessary. The robot, about the size of a card table with four bicycle wheels, sat unused until in 1966 SAIL’s deputy director Les Earnest rediscovered it. He persuaded the mechanical engineering department to lend it to SAIL to experiment in making an autonomous vehicle. Eventually, using the computing power of the SAIL mainframe, a graduate student was able to program the robot to follow a white line on the floor at a speed of less than one mile per hour. A radio control link enabled remote operation. Tracking would have been simpler with two photocell sensors, but a video camera connected to a computer was seen as a tour de force at the time.
Moravec would modify and hack the system for a decade so that ultimately it would be able to make it across a room, correctly navigating an obstacle course about half the time. The Cart failed in many ways. Attempting to simultaneously map and locate using only single camera data, Moravec had undertaken one of the hardest problems in AI. His goal was to build an accurate three-dimensional model of the world as a key step toward understanding it.
At the time the only feedback came from seeing how far the Cart had moved. It didn’t have true stereoscopic vision, so the Cart lacked depth perception. As a cost-saving measure, he would move the camera back and forth along a bar at right angles to the field of view, making it possible for the software to calculate a stereo view from a single camera. It was an early predecessor of the software approach taken decades later by the Israeli computer vision company Mobileye.
Driving automatically was rather slow and boring, and so with its remote link and video camera connection Moravec enjoyed controlling the Cart remotely from his computer workstation. It all seemed very futuristic to pretend that he was at the controls of a lunar rover, wandering around SAIL, which was housed in a circular building in the hills west of Stanford. Before long the driveway leading up to the lab was sporting a yellow traffic sign that read CAUTION ROBOTIC VEHICLE. The Cart did venture outside but not very successfully. Indeed it seemed to have a propensity to find trouble. One setback occurred in October of 1973 when the Cart, being driven manually, ran off an exit ramp, tipped over, leaked acid from a battery, and in the process destroyed precious electronic circuitry.35 It took almost a year to rebuild.
Moravec would often try to drive the Cart around the building, but toward the rear of the lab the road dipped, causing the radio signal to weaken and making it hard to see exactly where the Cart was. Once, while the Cart was circling the building, he misjudged its location and made a wrong turn. Rather than returning in a circle, the robot headed down the driveway to busy Arastradero Road, which runs through the Palo Alto foothills. Moravec kept waiting for the signal from the robot to improve, but it stayed hazy. The television image was filled with static. Then to his surprise, he saw a car drive by the robot. That seemed odd. Finally he got up from his computer terminal and went outside to track the robot down physically. He walked to where he assumed the robot would be, but found nothing. He decided that someone was playing a joke on him. Finally, as he continued to hunt for the errant machine it came rolling back up the driveway with a technician sitting on it. The Stanford Cart had managed to make its way far down Arastradero Road and was driving away from the lab by the time it was captured. Since those baby steps, engineers have made remarkable progress in designing self-driving cars. Moravec’s original premise that it is only necessary to wait for computing to fall in cost and grow more powerful has largely proven true.
He has quietly continued to pursue machine vision technology, but there have been setbacks. In October of 2014, his start-up factory vision system declared bankruptcy and underwent a court-ordered restructuring. Despite the disappointments, only the timeline in his agenda has changed. To the question of whether the current wave of artificial intelligence and robotics will replace human labor, he responds with a twinkle that what he is about is replacing humanity—“labor is such a minimalist goal.”
He originally sketched out his vision of the near future in his second book, Robot: Mere Machine to Transcendent Mind. Here, Moravec concludes there is no need to replace capitalism because it is worthwhile for evolving machines to compete against each other. “The suggestion,” he said, “is in fact that we engineer a rather modest retirement for ourselves.” The specter of the “end of labor,” which today is viewed with growing alarm by many technologists, is a relatively minor annoyance in Moravec’s worldview. Humans are adept at entertaining each other. Like many of his Singularitarian brethren, he instead wonders what we will do with a superabundance of all of society’s goods and services. Democracy, he suggests, provides a path to sharing the vast accumulation of capital that will increasingly come from superproductive corporations. It would be possible, for example, to increase Social Security payments and lower the retirement age until it eventually equals birth.
In Moravec’s worldview, augmentation is an interim stage of technology development, only necessary during the brief period when humans can still do things that the machines can’t. Like Licklider he assumes that machines will continue to improve at a faster and faster rate, while humans will evolve only incrementally. Not by 2020—and at one point he believed 2010—but sometime reasonably soon thereafter so-called universal robots will arrive that will be capable of a wide set of basic applications. It was an idea he first proposed in 1991, and only the timeline has been altered. At some point these machines will improve to the point where they will be able to learn from experience and gradually adapt to their surroundings. He still retains his faith in Asimov’s three laws of robotics. The market will ensure that robots behave humanely—robots that cause too many deaths simply won’t sell very well. And at some point, in Moravec’s view, machine consciousness will emerge as well.
Also in Robot: Mere Machine to Transcendent Mind, he argues that strict laws be passed and applied to fully automated corporations. The laws would limit the growth of these corporations—and the robots they control—and prohibit them from taking too much power. If they grow too large, an automatic antitrust mechanism will go into effect, forcing them to divide. In Moravec’s future world, rogue corporations will be held in check by a society of AI-based corporations, working to protect the common good. There is nothing romantic in his worldview: “We can’t get too sentimental about the robots, because unlike human beings the robots don’t have this evolutionary history where their own survival is really the most important thing,” he said. He still holds to the basic premise—the arrival of the AI-based corporation and the universal robot will mark a utopia that will satisfy every human desire.
His worldview isn’t entirely utopian, however. There is also a darker framing to his AI/robot future. Robots will be expanding into the solar system, mining the asteroids and reproducing and building copies of themselves. This is where his ideas begin to resemble Blade Runner—the dystopian Ridley Scott movie in which androids have begun to colonize the solar system. “Something can go wrong, you will have rogue robots out there,” he said. “After a while you will end up with an asteroid belt and beyond that is full of wildlife that won’t have the mind-numbing restrictions that the tame robots on Earth have.” Will we still need a planetary defense system to protect us from our progeny? Probably not, he reasons. This new technological life-form will be more interested in expanding into the universe—hopefully.
From his cozy and solitary command center in suburban Pittsburgh in a room full of computer screens it is easy to buy into Moravec’s science-fiction vision. So far, however, there is precious little solid evidence that there will be a rapid technological acceleration that will bring about the AI promised land in his lifetime. Despite the reality that we don’t yet have self-driving cars and the fact that he has been forced to revise the timing of his estimates, he displays the curves on his giant computer screens and remains firm in his basic belief that society is still on track to create its successor species.
Will humans join in this grand adventure? Although he proposed the idea of uploading a human mind into a computer in Mind Children, Moravec is not committed to Ray Kurzweil’s goal of “living long enough to live forever.” Kurzweil is undergoing extraordinary and questionable medical procedures to extend his life. Moravec confines his efforts to survive until 2050 to eating well and walking frequently, and at the age of sixty-four, he does perhaps stand a plausible chance of surviving that long.
During the 1970s and 1980s the allure of artificial intelligence would draw a generation of brilliant engineers, but it would also disappoint. When AI failed to deliver on its promise they would frequently turn to the contrasting ideal of intelligence augmentation.
Sheldon Breiner grew up in a middle-class Jewish family in St. Louis and, from an early age, was extraordinarily curious about virtually everything he came in contact with. He chose college at Stanford during the 1950s, in part to put as much distance as possible between himself and the family bakery. He wanted to see the world, and even in high school realized that if he stayed in St. Louis his father would likely compel him to take over the family business.
After graduating he traveled in Europe, spent some time in the army reserve, and then came back to Stanford to become a geophysicist. Early on he had become obsessed with the idea that magnetic forces might play a role in either causing or perhaps predicting earthquakes. In 1962 he had taken a job at Varian Associates, an early Silicon Valley firm making a range of magnetometers. His assignment was to find new applications for these instruments that could detect minute variations in the Earth’s magnetic field. Varian was the perfect match for Breiner’s 360-degree intelligence. For the first time highly sensitive magnetometers were becoming portable, and there was a willing market for clever new applications that would range from finding oil to airport security. Years later Breiner would become something of a high-tech Indiana Jones, using the technology to explore archaeological settings. In Breiner’s expert hands, Varian magnetometers would find avalanche victims, buried treasure, missing nuclear submarines, and even buried cities. Early on he conducted a field experiment from a site behind Stanford, where he measured the electromagnetic pulse (EMP) generated by a 1.4-megaton nuclear detonation 250 miles above the Earth. The classified test, known as Starfish Prime, led to new understanding about the impact of nuclear explosions on Earth-based electronics.
For his 1967 doctoral thesis he set out to explore the question of whether minute changes in the huge magnetic forces deep in the Earth could play a role in predicting earthquakes. He set up an array of magnetometers in individual trailers along a 120-mile stretch on the San Andreas Fault and used phone lines to send the data back to a laboratory in an old shack on the Stanford campus. There he installed a pen plotter that would record signals from the various magnetometers. It was an ungainly device that pushed rather than pulled a roll of chart paper under five colored ink pens. He hired a teenager from a local high school to change the paper and time-stamp the charts, but the device was so flawed that it caused the paper to ball up in huge piles almost every other day. He redesigned the system around a new digital printer from Hewlett-Packard and the high school student, who had been changing the paper for a dollar a day, was an early automation casualty.
Later, Breiner was hired by Hughes Corp. to work on the design of a deep ocean magnetometer to be towed by the Glomar Explorer. The cover story was to hunt for minerals such as manganese nodules on the seabed at depths of ten thousand to twelve thousand feet. A decade later the story leaked that the actual mission was a Central Intelligence Agency operation to find and raise a sunken Soviet submarine from the bottom of the Pacific Ocean. In 1968, after the assassination of Robert Kennedy, Breiner was asked by the White House science advisor to demonstrate technology to detect hidden weapons. He went to the Executive Office Building and demonstrated a relatively simple scheme employing four magnetometers that would become the basis for the modern metal detectors still widely used in airports and public buildings.36
Ultimately Breiner was able to demonstrate evidence of magnetic variation along the fault associated with earthquakes, but the data was clouded by geomagnetic activity and his hypothesis did not gain wide acceptance. He didn’t let the lack of scientific certainty hold him back. At Varian he had been paid to think up a wide range of commercial applications for magnetometers and in 1969 he and five Varian colleagues founded Geometrics, a company that used airborne magnetometers to prospect for oil deposits.
He would run his oil prospecting company for seven years before selling to Edgerton, Germeshausen, and Grier (EG&G), and then work for seven more years at their subsidiary before leaving in 1983. By then, the artificial intelligence technology that had been pioneered in John McCarthy’s SAIL and in the work that Feigenbaum and Lederberg were doing to capture and bottle human expertise was beginning to leak out into the surrounding environment in Silicon Valley. A Businessweek cover story in July of 1984 enthused, “Artificial Intelligence—It’s Here!” Two months later on CBS Evening News Dan Rather gave glowing coverage to the SRI work in developing expert systems to hunt for mineral deposits. Bathed in the enthusiasm, Breiner would become part of a wave of technology-oriented entrepreneurs who came to believe that the time was ripe to commercialize the field.
The earlier work on Dendral in 1977 had led to a cascade of similar systems. Mycin, also produced at Stanford, was based on an “inference engine” that did if/then-style logic and a “knowledge base” of roughly six hundred rules to reason about blood infections. At the University of Pittsburgh during the 1970s a program called Internist-I was another early effort to tackle the challenge of disease diagnosis and therapy. In 1977 at SRI, Peter Hart, who began his career in artificial intelligence working on Shakey the robot, and Richard Duda, another pioneering artificial intelligence researcher, built Prospector to aid in the discovery of mineral deposits. That work would eventually get CBS’s overheated attention. In the midst of all of this, in 1982, Japan announced its Fifth Generation Computer program. Heavily focused on artificial intelligence, it added an air of competition and inevitability to the AI boom that would lead to a market in which newly minted Ph.D.s could command unheard-of $30,000 annual salaries right out of school.
The genie was definitely out of the bottle. Developing expert systems was becoming a discipline called “knowledge engineering”—the idea was that you could package the expertise of a scientist, an engineer, or a manager and apply it to the data of an enterprise. The computer would effectively become an oracle. In principle that technology could be used to augment a human, but software enterprises in the 1980s would sell it into corporations based on the promise of cost savings. As a productivity tool its purpose was as often as not to displace workers.
Breiner looked around for industries where it might be easy to package the knowledge of human experts and quickly settled on commercial lending and insurance underwriting. At the time there was no widespread alarm about automation and he didn’t see the problem framed in those terms. The computing world was broken down into increasingly inexpensive personal computers and more costly “workstations,” generally souped-up machines for computer-aided design applications. Two companies, Symbolics and Lisp Machines, Inc., spun directly out of the MIT AI Lab to focus on specialized computers running the Lisp programming language, designed for building AI applications.
Breiner founded his own start-up, Syntelligence. Along with Teknowledge and Intellicorp, it would become one of the three high-profile artificial intelligence companies in Silicon Valley in the 1980s. He went shopping for artificial intelligence talent and hired Hart and Duda from SRI. The company created its own programming language, Syntel, which ran on an advanced workstation used by the company’s software engineers. It also built two programs, Underwriting Advisor and Lending Advisor, which were intended for use on IBM PCs. He positioned the company as an information utility rather than as an artificial intelligence software publisher. “In every organization there is usually one person who is really good, who everybody calls for advice,” he told a New York Times reporter writing about the emergence of commercial expert systems. “He is usually promoted, so that he does not use his expertise anymore. We are trying to protect that expertise if that person quits, dies or retires and to disseminate it to a lot of other people.” The article, about the ability to codify human reasoning, ran on the paper’s front page in 1984.37
When marketing his loan expert and insurance expert software packages, Breiner demonstrated dramatic, continuing cost savings for customers. The idea of automating human expertise was compelling enough that he was able to secure preorders from banks and insurance companies and investments from venture capital firms. AIG, St. Paul, and Fireman’s Fund as well as Wells Fargo and Wachovia advanced $6 million for the software. Breiner stuck with the project for almost a half decade, ultimately growing the company to more than a hundred employees and pushing revenues to $10 million annually. The problem was that wasn’t fast enough for his investors. In 1983 the five-year projections had been to be at $50 million of annual revenue. When the commercial market for artificial intelligence software failed to materialize quickly enough, inside the company he struggled, most bitterly with board member Pierre Lamond, a venture capitalist who was a veteran of the semiconductor industry with no software experience. Ultimately Breiner lost his battle and Lamond brought in an outside corporate manager who moved the company headquarters to Texas, where the manager lived.
Syntelligence itself would confront directly what would be become known as the “AI Winter.” One by one the artificial intelligence firms of the early 1980s were eclipsed either because they failed financially or because they returned to their roots as experimental efforts or consulting companies. The market failure became an enduring narrative that came to define artificial intelligence, with a repeated cycle of hype and failure fueled by overly ambitious scientific claims that are inevitably followed by performance and market disappointments. A generation of true believers, steeped in the technocratic and optimistic artificial intelligence literature of the 1960s, clearly played an early part in the collapse. Since then the same boom-and-bust cycle has continued for decades, even as AI has advanced.38 Today the cycle is likely to repeat itself again as a new wave of artificial intelligence technologies is being heralded by some as being on the cusp of offering “thinking machines.”
The first AI Winter had actually come a decade earlier in Europe. Sir Michael James Lighthill, a British applied mathematician, led a study in 1973 that excoriated the field for not delivering on the promises and predictions, such as the early SAIL prediction of a working artificial intelligence in a decade. Although it had little impact in the United States, the Lighthill report, “Artificial Intelligence: A General Survey,” led to the curtailment of funding in England and a dispersal of British researchers from the field. In a footnote of the report the BBC arranged a televised debate on the future of AI where the targets of Lighthill’s criticism were given a forum to respond. John McCarthy was flown in for the event but was unable to offer a convincing defense of his field.
A decade later a second AI Winter would descend in the United States, beginning in 1984, when Breiner managed to push Syntelligence sales to $10 million before departing. There had been warnings of “irrational exuberance” for several years when Roger Schank and Marvin Minsky raised the issue early on at a technical conference, claiming that emerging commercial expert systems contained no significant technical advances from work that had begun two decades earlier.39 The year 1984 was also when Doug Engelbart’s and Alan Kay’s augmentation ideas dramatically came within the reach of every office worker. Needing a marketing analogy to frame the value of the personal computer with the launch of the Macintosh, Steve Jobs hit on the perfect metaphor for the PC. It was a “bicycle for our minds.”
Pushed out of the company he had founded, Breiner went on to his next venture, a start-up company designing software for Apple’s Macintosh. From the 1970s through the 1980s it was a path followed by many of Silicon Valley’s best and brightest.
Beginning in the 1960s, the work that had been conducted quietly at the MIT and Stanford artificial intelligence laboratories and at the Stanford Research Institute began to trickle out into the rest of the world. The popular worldview of robotics and artificial intelligence had originally been given form by literary works—the mythology of the Prague Golem, Mary Shelley’s Frankenstein, and Karel Čapek’s pathbreaking R. U. R. (Rossum’s Universal Robots)—all posing fundamental questions about the impact of robotics on humans life. However, as America prepared to send humans to the moon, a wave of technology-rich and generally optimistic science fiction appeared from writers like Isaac Asimov, Robert Heinlein, and Arthur C. Clarke. HAL, the run-amok sentient computer in Clarke’s 2001: A Space Odyssey, not only had a deep impact on popular culture, it changed people’s lives. Even before he began as a graduate student in computer science at the University of Pennsylvania, Jerry Kaplan knew what he planned to do. The film version of 2001 was released in the spring of 1968, and over the summer Kaplan watched it six times. With two of his friends he went back again and again and again. One of his friends said, “I’m going to make movies.” And he did—he became a Hollywood director. The other friend became a dentist, and Kaplan went into AI.
“I’m going to build that,” he told his friends, referring to HAL. Like Breiner, he would become instrumental as part of the first generation to attempt to commercialize AI, and also like Breiner, when that effort ran aground in the AI Winter, he would turn to technologies that augmented humans instead.
As a graduate student Kaplan had read Terry Winograd’s SHRDLU tour de force on interacting with computers via natural language. It gave him a hint about what was possible in the world of AI as well as a path toward making it happen. Like many aspiring computer scientists at the time, he would focus on understanding natural language. A math whiz, he was one of a new breed of computer nerds who weren’t just pocket-protector-clad geeks, but who had a much broader sense of the world.
After he graduated with a degree in the philosophy of science from the University of Chicago, he followed a girlfriend to Philadelphia. An uncle hired him to work in the warehouse of his wholesale pharmaceuticals business while grooming him to one day take over the enterprise. Dismayed by the claustrophobic family business, he soon desperately needed to do something different and he remembered both a programming class he had taken at Chicago and his obsession with A Space Odyssey. He enrolled as a graduate student in computer science at the University of Pennsylvania. Once there he studied with Aravind Krishna Joshi, an early specialist in computational linguistics. Even though he had come in with a liberal arts background he quickly became a star. He went through the program in five years, getting perfect scores in all of his classes and writing his graduate thesis on the subject of building natural language front ends to databases.
As a newly minted Ph.D., Kaplan gave job audition lectures at Stanford and MIT, visited SRI, and spent an entire week being interviewed at Bell Labs. Both the telecommunications and computer industry were hungry for computer science Ph.D.s and on his first visit to Bell Labs he was informed that the prestigious lab had a target of hiring 250 Ph.D.s, and had no intention of hiring below average. Kaplan couldn’t help pointing out that 250 was more than the entire number of Ph.D.s that the United States would produce that year. He picked Stanford, after Ed Feigenbaum had recruited him as a research associate in the Knowledge Engineering Laboratory. Stanford was not as intellectually rigorous as Penn, but it was a technological paradise. Silicon Valley had already been named, the semiconductor industry was under assault from Japan, and Apple Computer was the nation’s fastest-growing company.
There was free food at corporate and academic events every evening and no shortage of “womanizing” opportunities. He bought a home in Los Trancos Woods several miles from Stanford, near SAIL, which was just in the process of moving from the foothills down to a new home on the central Stanford campus.
When he arrived at Stanford in 1979 the first golden age of AI was in full swing—graduate students like Douglas Hofstadter, the author of Gödel, Escher, Bach: An Eternal Golden Braid; Rodney Brooks; and David Shaw, who would later take AI techniques and transform them into a multibillion-dollar hedge fund on Wall Street, were all still around. The commercial forces that would lead to the first wave of AI companies like Intellicorp, Syntelligence, and Teknowledge were now taking shape. While Penn had been like an ivory castle, the walls between academia and the commercial world were coming down at Stanford. There was wheeling and dealing and start-up fever everywhere. Kaplan’s officemate, Curt Widdoes, would soon take the software used to build the S1 supercomputer with him to cofound Valid Logic Systems, an early electronic design automation company. They used newly developed Stanford University Network (SUN) workstations. Graduate student Andy Bechtolsheim—sitting in the next room—had designed the original SUN hardware, and would soon cofound Sun Microsystems, thus commercializing the hardware he had developed as a graduate student.
Kaplan rapidly became a “biz-dev” guy. It was in the air. He had an evening consulting gig developing the software for what would become Synergy, the first all-digital music keyboard synthesizer. It was chock-full of features that have become standard on modern synthesizers, and was used to produce the soundtrack for the movie Tron. Like everyone at Stanford, he was making money on the side. They were all starting companies. There was a guy in the basement, Leonard Bosack, who was trying to figure out how to interconnect computers and would eventually found Cisco Systems with his wife, Sandy Lerner, to make the first network routers.
Kaplan had a research associate job at Stanford, which was great. It was equivalent to a non-tenure track teaching position, but without the pain of having to teach. There was, however, a downside. Research staff were second-class citizens to academic faculty. He was treated like the hired help, even though he could write code and do serious technical work. His role was like Scotty, the reliable engineer on the starship Enterprise in Star Trek. He was the person who made things work. Fueled in part by the Reagan-era Strategic Defense Initiative, vast new investments were being made in artificial intelligence. It was military-led spending, but it wasn’t entirely about military applications. Corporate America was toying with the idea of expert systems. Ultimately the boom would lead to forty start-up companies and U.S. sales of AI-related hardware and software of $425 million in 1986. As an academic, Kaplan lasted just two years at Stanford. He received two offers to join start-ups at the same time, both in the AI world. Ed Feigenbaum, who had decided that the Stanford computer scientists should get paid for what they were already doing academically, was assembling one of the start-ups, Teknowledge. The new company would rapidly become the Cadillac of expert system consulting, also developing custom products. The other start-up was called Symantec. Decades later it would become a giant computer security firm, but at the outset Symantec began with an AI database program that overlapped with Kaplan’s technical expertise.
It was a time when Kaplan seemed to have an unlimited capacity to work. He wasn’t a big partier, he didn’t like being interrupted, and he viewed holidays as a time to get even more accomplished. Gary Hendrix, a respected natural language researcher at SRI, approached him to help with the programming of an early demo version of a program called Q&A, the first natural language database. The idea was that unskilled users would be able to retrieve information by posing queries in normal sentences. There was no money, only a promise of stock if the project took off.
Kaplan’s expertise was on natural language front ends that would allow typed questions to an expert system. What Hendrix needed, however, was a simple database back end for his demonstration. And so over a Christmas holiday at the end of 1980, Kaplan sat down and programmed one. The entire thing initially ran on an Apple II. He did it on a contingent basis and in fact he didn’t get rich. The first Symantec never went anywhere commercially and the venture capitalists did a “cram down,” a financial maneuver in which company founders often see their equity lose value in exchange for new investments. As a result, what little stock Kaplan owned was now worthless.
In the end he left Stanford and joined Teknowledge because he admired Lee Hecht, the University of Chicago physicist and business school professor who had been brought in to be CEO and provide adult supervision for the twenty Stanford AI refugees who were the Teknowledge shock troops. “Our founders have build [sic] more expert systems than anyone else,” Hecht told Popular Science in 1982.40 Teknowledge set up shop at the foot of University Ave., just off the Stanford campus, but soon moved to flashier quarters farther down the street in the one high-rise in downtown Palo Alto. In the early 1980s the office had a sleek modernist style that leaned heavily toward black.
The state-of-the-art office offered a clear indication that the new AI programs wouldn’t be cheap. Just one rule for one of the expert systems would require an interviewer to spend an hour with a human expert, and a working expert system would consist of five hundred rules or more. A complete system might cost as much as $4 million to build, but Hecht, like Breiner, believed that by bottling human expertise, corporations could reap vast savings over time. A complete system might save a manufacturer as much as $100 million annually, he told the magazine. An oil company expert system that they were prototyping might save as much as $1,000 per well per day, Hecht claimed. In the article Feigenbaum also asserted that the bottleneck would be broken when computers themselves began automatically interviewing experts.41 Hecht saw more than a hacker in Kaplan and made him a promise—if he came to Teknowledge he would teach him how to run a business. He jumped at the chance. His office was adjacent to Hecht’s and he set out to build a next-generation consulting firm whose mission was to replace the labor of human experts with software.
However, in the beginning Kaplan knew nothing about the art of selling high-technology services. He was put in charge of marketing and the first thing he did was prepare a brochure describing the firm’s services. From an entirely academic background he put together a trifold marketing flyer that was intended to attract corporate customers to a series of seminars on how to build an expert system featuring Feigenbaum as the star speaker. He sent out five thousand brochures. You were supposed to get a 2 percent response rate. Instead of a hundred responses, they got just three, and one was from a guy who thought they were teaching about artificial insemination. It was a rude shock for a group of AI researchers, confident that they were about to change the world overnight, that outside of the university nobody had heard of artificial intelligence. Eventually, they were able to pull together a small group of mostly large and defense-oriented corporations, making it possible for Hecht to say that there had “been inquiries from more than 50 major companies from all over the world,” and Teknowledge was able to do $1 million in business in two months at the beginning of 1982.42
It was indeed a Cadillac operation. They wrote the programs in Lisp on fancy $20,000 Xerox Star workstations. Worse, the whole operation was buttressed by just a handful of marketers led by Kaplan. The Teknowledge worldview was, “We’re smart, we’re great, people should just give us money.” It was completely backward, and besides, the technology didn’t really work. Despite the early stumbles, however, they eventually attracted attention. One day the king of Sweden even came to visit. True to protocol his arrival had all the trappings of a regal entourage. The Secret Service showed up first to inspect the office, including the bathroom. The assembled advance team appeared to be tracking the king in real time as they waited. Kaplan was standing breathlessly at the door when a small, nondescript gentleman in standard Silicon Valley attire—business casual—walked in unaccompanied and innocently said to the young Teknowledge executive, “Where should I sit?” Flustered, Kaplan responded, “Well, this is a really bad time because we’re waiting for the king of Sweden at the moment.” The king interrupted him. “I am the king of Sweden.” The king turned out to be perfectly tech savvy: he had a deep understanding of what they were trying to do, more so than most of their prospective customers—which, of course, was at the heart of the challenge that they faced.
There was, however, one distinct upside for Kaplan. He was invited to an evening reception for the king held at the Bohemian Club in San Francisco. He arrived and fell into conversation with a beautiful Swedish woman. They spoke for almost an hour and Kaplan thought that maybe she was the queen. As it turned out, she was a stewardess who worked for the Swedish airline that flew the royal entourage to the United States. The joke cut both ways, because she thought he was Steve Jobs. There was a happy ending. They would date for the next eight years.
Teknowledge wasn’t so lucky. The company had a good dose of “The Smartest Guys in the Room” syndrome. With a who’s who of some of the best engineers in AI, they had captured the magic of the new field and for what might otherwise pass for exorbitant consulting fees they would impart their alchemy. However, artificial intelligence systems at the time were little more than accretions of if-then-else statements packaged in overpriced workstations and presented with what were then unusually large computer displays with alluring graphical interfaces. In truth, they were more smoke and mirrors than canned expertise.
It was Kaplan himself who would become something of a Trojan horse within the company. In 1981 the IBM PC had legitimized personal computers and dramatically reduced their cost while expanding the popular reach of computing. Doug Engelbart and Alan Kay’s intelligence augmentation—IA—meme was showing up everywhere. Computing could be used to extend or replace people, and the falling cost made it possible for software designers to take either path. Computing was now sneaking out from behind the carefully maintained glass wall of the corporate data center and showing up in the corporate office supplies budget.
Kaplan was quick to grasp the implications of the changes. Larry Tesler, a former SAIL researcher who would work for Steve Jobs in designing the Lisa and the Macintosh and help engineer the Newton for John Sculley, had the same early epiphany. He had tried to warn his coworkers at Xerox PARC that cheap PCs were going to change the world, but at the time—1975—no one was listening. Six years later, many people still didn’t comprehend the impact of the falling cost of the microprocessor. Teknowledge’s expert system software was then designed and deployed on an overpriced workstation, which cost about $17,000, and a complete installation might run between $50,000 and $100,000. But Kaplan realized that PCs were already powerful enough to run the high-priced Teknowledge software handily. Of course, the business implication was that without their flashy workstation trappings, they would be seen for what they really were—software packages that should sell for PC software prices.
Nobody at Teknowledge wanted to hear this particular heresy. So Kaplan did what he had done a few years earlier when he had briefly helped found Symantec in his spare time at Stanford. It was Christmas, and everyone else was on vacation, so he holed up in his cottage in the hills behind Stanford and went to work rewriting the Teknowledge software to run on a PC. Kaplan used a copy of Turbo Pascal, a lightning-fast programming language that made his version of the expert system interpreter run faster than the original workstation product. He finished the program over the holidays and came in and demoed the Wine Advisor, the Teknowledge demonstration program, on his “toy” personal computer. It just killed the official software running on the Xerox Star workstation.
All hell broke loose. Not only did it break the Teknowledge business model because software for personal computers was comparatively dirt cheap, but it violated their very sense of their place in the universe! Everyone hated him. Nonetheless, Kaplan managed to persuade Lee Hecht to commit to putting out a product based on the PC technology. But it was crazy. It meant selling a product for $80 rather than $80,000. Kaplan had become the apostate and he knew he was heading for the door. Ann Winblad, who was then working as a Wall Street technology analyst and would later become a well-known Silicon Valley venture capitalist, came by and Kaplan pitched her on the changing state of the computing world.
“I know someone you need to meet,” she told him.
That someone turned out to be Mitch Kapor, the founder and chief executive of Lotus Development Corporation, the publisher of the 1-2-3 spreadsheet program. Kapor came by and Kaplan pitched him on his AI-for-the masses vision. The Lotus founder was enthusiastic about the idea: “I’ve got money, why don’t you propose a product you want to build for me,” he said.
Kaplan’s first idea was to invent an inexpensive version of the Teknowledge expert system to be called ABC, as a play on 1-2-3. The idea attracted little enthusiasm. Not long afterward, however, he was flying on Kapor’s private jet. The Lotus founder sat down with paper notes and a bulky Compaq computer the size of a sewing machine and began typing. That gave Kaplan a new idea. He proposed a free-form note-taking program that would act as a calendar and a repository for all the odds and ends of daily life. Kapor loved the idea and with Ed Belove, another Lotus software designer, the three men outlined a set of ideas for the program.
Kaplan again retreated back to his cottage, this time for a year and a half, just writing the code for the program with Belove while Kapor helped with the overall design. Lotus Agenda was the first of a new breed of packaged software, known as a Personal Information Manager, which was in some ways a harbinger of the World Wide Web. Information could be stored in free form and would be automatically organized into categories. It came to be described as a “spreadsheet for words” and it was a classic example of a new generation of software tools that empowered their users in the Engelbart tradition.
Introduced in 1988 to glowing reviews from industry analysts like Esther Dyson, it would go on to gather a cult following. The American AI Winter was just arriving and most of the new wave of AI companies would soon wilt, but Kaplan had been early to see the writing on the wall. Like Breiner, he went quickly from being an AI ninja to a convert to Engelbart’s world of augmentation. PCs were the most powerful intellectual tool in history. It was becoming clear that it was equally possible to design humans into and out of systems being created with computers. Just as AI stumbled commercially, personal computing and thus intelligence augmentation shot ahead. In the late 1970s and early 1980s the personal computer industry exploded on the American scene. Overnight the idea that computing could be both a “fantasy amplifier” at home and a productivity tool at the office replaced the view of computing as an impersonal bureaucratic tool of governments and corporations. By 1982 personal computing had become such a cultural phenomenon that Time magazine put the PC on its cover as “Man of the Year.”
It was the designers themselves who made the choice of IA over AI. Kaplan would go on to found Go Corp. and design the first pen-based computers that would anticipate the iPhone and iPad by more than a decade. Like Sheldon Breiner, who was also driven away from artificial intelligence by the 1980s AI Winter, he would become part of the movement toward human-centered design in a coming post-PC era.
The quest to build a working artificial intelligence was marked from the outset by false hopes and bitter technical and philosophical quarrels. In 1958, two years after the Dartmouth Summer Research Project on Artificial Intelligence, the New York Times published a brief UPI wire story buried on page 25 of the paper. The headline read NEW NAVY DEVICE LEARNS BY DOING: PSYCHOLOGIST SHOWS EMBRYO OF COMPUTER DESIGNED TO READ AND GROW WISER.43
The article was an account of a demonstration given by Cornell psychologist Frank Rosenblatt, describing the “embryo” of an electronic computer that the navy expected would one day “walk, talk, see, write, reproduce itself and be conscious of its existence.” The device, at this point, was actually a simulation running on the Weather Bureau’s IBM 704 computer that was able to tell right from left after some fifty attempts, according to the report. Within a year, the navy apparently was planning to build a “thinking machine” based on these circuits, for a cost of $100,000.
Dr. Rosenblatt told the reporters that this would be the first device to think “as the human brain,” and that it would make mistakes initially but would grow wiser with experience. He suggested that one application for the new mechanical brain might be as a proxy for space exploration in lieu of humans. The article concluded that the first perceptron, an electronic or software effort to model biological neurons, would have about a thousand electronic “association cells” receiving electrical impulses from four hundred photocells—“eye-like” scanning devices. In contrast, it noted, the human brain was composed of ten billion responsive cells and a hundred million connections with the eyes.
The earliest work on artificial neural networks dates back to the 1940s, and in 1949 that research had caught the eye of Marvin Minsky, then a young Harvard mathematics student, who would go on to build early electronic learning networks, one as an undergraduate and a second one, named the Stochastic Neural Analog Reinforcement Calculator, or SNARC, as a graduate student at Princeton. He would later write his doctoral thesis on neural networks. These mathematical constructs are networks of nodes or “neurons” that are interconnected by numerical values that serve as “weights” or “vectors.” They can be trained by being exposed to a series of patterns such as images or sounds to later recognize similar patterns.
During the 1960s a number of competing paths toward building thinking machines emerged, and the dominant direction became the logic- and rule-based approach favored by John McCarthy. However, during the same period, groups around the country were experimenting with competing analog approaches based on the earlier neural network ideas. It’s ironic that Minsky, one of the ten attendees at the Dartmouth conference, would in 1969 precipitate a legendary controversy by writing the book Perceptrons with Seymour Papert, an analysis of neural networks that is widely believed to have stalled neural net research for many years. There is general agreement that as a consequence of the critique set forth in their book, the two MIT artificial intelligence researchers significantly delayed the young research area.
In fact, it was just one of a series of heated intellectual battles within the AI community during the sixties. Minsky and Papert have since argued that the criticism was unfair and that their book was a more balanced analysis of neural networks than was conceded by its critics. This dispute was further complicated by the fact that one of the main figures in the field, Rosenblatt, would die two years later in a sailing accident, leaving a vacuum in research activity into neural nets.
Early neural network research included work done at Stanford University as well as the research led by Charlie Rosen at SRI, but the Stanford group refocused its attention on telecommunications and Rosen would shift his Shakey work toward the dominant AI framework. Interest in neural networks would not reemerge until 1978, with the work of Terry Sejnowski, a postdoctoral student in neurobiology at Harvard. Sejnowski had given up his early focus on physics and turned to neuroscience. After taking a summer course in Woods Hole, Massachusetts, he found himself captivated by the mystery of the brain. That year a British postdoctoral psychologist, Geoffrey Hinton, was studying at the University of California at San Diego under David Rumelhart. The older UC scientist had created the parallel-distributed processing group with Donald Norman, the founder of the cognitive psychology department at the school.
Hinton, who was the great-great-grandson of logician George Boole, had come to the United States as a “refugee” as a direct consequence of the original AI Winter in England. The Lighthill report had asserted that most AI research had significantly underdelivered on its promise, the exception being computational neuroscience. In a Lighthill-BBC televised “hearing,” both sides made their arguments based on the then state-of-the-art performance of computers. Neither side seemed to have taken note of the Moore’s law of acceleration of computing speeds.
As a graduate student Hinton felt personally victimized by Minsky and Papert’s attacks on neural networks. When he would tell people that he was working on artificial neural networks as a graduate student in England, their response would be, “Don’t you get it? Those things are no good.” His advisor told him to forget his interests and read Terry Winograd’s thesis. It was all going to be symbolic logic in the future. But Hinton was on a different path. He was beginning to form a perspective that he would later describe as “neuro-inspired” engineering. He did not go to the extreme of some in the new realm of biological computing. He thought that slavishly copying biology would be a mistake. Decades later the same issue remains hotly disputed. In 2014 the European Union funded Swiss researcher Henry Markram with more than $1 billion to model a human brain at the tiniest level of detail, and Hinton was certain that the project was doomed to failure.
In 1982 Hinton had organized a summer workshop focusing on parallel models of associated memory, where Terry Sejnowski applied to attend. Independently the young physicist had been thinking about how the brain might be modeled using some of the new schemes that were being developed. It was the first scientific meeting that Hinton had organized. He was aware that the invited crowd had met repeatedly in the past—people he thought of as “elderly professors in their forties” would come and give their same old talks. He drew up a flyer and sent it to his targeted computer science and psychology departments. It offered to pay expenses for those with new ideas. He was predictably disappointed when most of the responses came at the problem using traditional approaches within computer science and psychology. But one of the proposals stood out. It was from a young scientist who claimed to have figured out the “machine code of the brain.”
At roughly the same time Hinton was attending a conference with David Marr, the well-known MIT vision researcher, and he asked him if the guy was crazy. Marr responded that he knew him and that he was very bright and he had no idea if he was crazy or not. What was clear was that Sejnowski was pursuing a set of new ideas about cognition.
At the meeting Hinton and Sejnowski met for the first time. UCSD was already alive with a set of new ideas attempting to create new models of how the brain worked. Known as Parallel Distributed Processing, or PDP, it was a break from the symbol processing approach that was then dominating artificial intelligence and the cognitive sciences. They quickly realized they had been thinking about the problem from a similar perspective. They could both see the power of a new approach based on webs of sensors or “neurons” that were interconnected by a lattice of values representing connection strengths. In this new direction, if you wanted the network to interpret an image, you described the image in terms of a web of weighted connections. It proved to be a vastly more effective approach than the original symbolic model for artificial intelligence.
Everything changed in 1982 when Sejnowski’s former physics advisor at Princeton, John Hopfield, invented what would become known as the Hopfield Network. Hopfield’s approach broke from earlier neural network models that had been created by the designers of the first perceptrons, by allowing the individual neurons to update their values independently. The fresh approach to the idea of neural networks inspired both Hinton and Sejnowski to join in an intense collaboration.
The two young scientists had both taken their first teaching positions by that time, Hinton at Carnegie Mellon and Sejnowski at Johns Hopkins, but they had become friends and were close enough that they could make the four-hour drive back and forth on weekends. They realized they had found a way to transform the original neural network model into a more powerful learning algorithm. They knew that humans learn by seeing examples and generalizing, and so mimicking that process became their focus. In creating a new kind of multilayered network, which they called a Boltzmann Machine, an homage to the Austrian physicist Ludwig Boltzmann. In their new model they conceived a more powerful approach to machine learning and made the most significant advance in design since the original single-layer learning algorithm designed by Rosenblatt.
Sejnowski had missed the entire political debate over the perceptron. As a physics graduate student he had been outside the world of artificial intelligence in the late 1960s when Minsky and Papert had made their attacks. Yet he had read the original Perceptron book and he had loved it for its beautiful geometric insights. He had basically ignored their argument that the perceptron would not be generalizable to the world of multilayer systems. Now he was able to prove them wrong.
Hinton and Sejnowski had developed an alternative model, but they needed to prove its power in contrast to the rule-based systems popular at the time. During the summer, with help of a graduate student, Sejnowski settled on a language problem to demonstrate the power of the new technique, training his neural net to pronounce English text as an alternative to a rule-based approach. At the time he had no experience in linguistics and so he went to the school library and checked out a textbook that was a large compendium of pronunciation rules. The book documented the incredibly complex set of rules and exceptions required to speak the English language correctly.
Halfway through their work on a neural network able to learn to pronounce English correctly, Hinton came to Baltimore for a visit. He was skeptical.
“This probably won’t work,” he said. “English is an incredibly complicated language and your simple network won’t be able to absorb it.”
So they decided to begin with a subset of the language. They went to the library again and found a children’s book with a very small set of words. They brought up the network and set it to work absorbing the language in the children’s book. It was spooky that within an hour it began to work. At first the sounds it generated were gibberish, like the sounds an infant might make, but as it was trained it improved continuously. Initially it got a couple of words correct and then it continued until it was able to perfect itself. It learned from both the general rules and the special cases.
They went back to the library and got another linguistics text containing a transcription of a story told by a fifth grader about what it was like in school and a trip to his grandmother’s house on one side of the page. On the other side were the actual sounds for each word transcribed by a phonologist. It was a perfect teacher for their artificial neurons and so they ran that information through the neural network. It was a relatively small corpus, but the network began to speak just like the fifth grader. The researchers were amazed and their appetite was whetted.
Next they got a copy of a twenty-thousand-word dictionary and decided to see how far they could push their prototype neural network. This time they let the program run for a week on what was a powerful computer for its day, a Digital Equipment Corp. VAX minicomputer. It learned and learned and learned and ultimately it was able to pronounce new words it had never seen before. It was doing an amazingly good job.
They called the program Nettalk. It was built out of three hundred simulated circuits they called neurons. They were arranged in three layers—an input layer to capture the words, an output layer to generate the speech sounds, and a “hidden layer” to connect the two. The neurons were interconnected to one another by eighteen thousand “synapses”—links that had numeric values that could be represented as weights. If these simple networks could “learn” to hear, see, speak, and generally mimic the range of things that humans do, they were obviously a powerful new direction for both artificial intelligence and augmentation.
After the success of Nettalk, Sejnowski’s and Hinton’s careers diverged. Sejnowski moved to California and joined the Salk Institute, where his research focused on theoretical problems in neuroscience. In exploring the brain he became a deep believer in the power of diversity as a basic principle of biology—a fundamental divergence from the way modern digital computing evolved. Hinton joined the computer science department at the University of Toronto and over the next two decades he would develop the original Boltzmann Machine approach. From the initial supervised model, he found ways to add unsupervised (automatic) learning. The Internet became a godsend, providing vast troves of data in the form of crowd-sourced images, videos, and snippets of speech, both labeled and unlabeled. The advances would eventually underpin a dramatic new tool for companies like Google, Microsoft, and Apple that were anxious to deploy Internet services based on vision, speech, and pattern recognition.
This complete reversal of the perceptron’s fate also lay in part in a clever public relations campaign, years in the making. Before Sejnowski and Hinton’s first encounter in San Diego, a cerebral young French student, Yann LeCun, had stumbled across Seymour Papert’s dismissive discussion of the perceptron, and it sparked his interest. After reading the account, LeCun headed to the library to learn everything he could about machines that were capable of learning. The son of an aerospace engineer, he had grown up tinkering with aviation hardware and was steeped in electronics before going to college. He would have studied astrophysics, but he enjoyed hacking too much. He read the entire literature on the perceptron going back to the fifties and concluded that there was no one working on the subject in the early 1980s. It was the heyday of expert systems and no one was writing about neural networks.
In Europe his journey began as a lonely crusade. As an undergraduate he would study electrical engineering, and he began his Ph.D. work with someone who had no idea about the topic he was focusing on. Then shortly after he began his graduate studies he stumbled across an obscure article on Boltzmann Machines by Hinton and Sejnowski. “I’ve got to talk to these guys!” he thought to himself. “They are the only people who seem to understand.”
Serendipitously, it turned out that they were able to meet in the winter of 1985 in the French Alps at a scientific conference on the convergence of ideas in physics and neuroscience. Hopfield Networks, which served as an early model for human memory, had sparked a new academic community of interest. Although Sejnowski attended the meeting he actually missed LeCun’s talk. It was the first time the young French scientist had presented in English, and he had been terrified, mostly because there was a Bell Laboratories physicist at the conference who often arrogantly shot down each talk with criticisms. The people that LeCun was sitting next to told him that was the Bell Labs style—either the ideas were subpar, or the laboratory’s scientists had already thought of them. To his shock, when he gave his talk in broken English, the Bell Labs scientist stood up and endorsed it. A year later, Bell Labs offered LeCun a job.
Later in the meeting, LeCun cornered Sejnowski and the two scientists compared notes. The conversation would lead to the creation of a small fraternity of researchers who would go on to formulate a new model for artificial intelligence. LeCun finished his thesis work on an approach to training neural networks known as “back propagation.” His addition made it possible to automatically “tune” the networks to recognize patterns more accurately.
After leaving school LeCun looked around France to find organizations that were pursuing similar approaches to AI. Finding only a small ministry of science laboratory and a professor who was working in a related field, LeCun obtained funding and laboratory space. His new professor told him, “I’ve no idea what you’re doing, but you seem like a smart guy so I’ll sign the papers.” But he didn’t stay long. First he went off to Geoff Hinton’s neural network group at the University of Toronto, and when the Bell Labs offer arrived he moved to New Jersey, continuing to refine his approach known as convolutional neural nets, initially focusing on the problem of recognizing handwritten characters for automated mail-sorting applications. French-born Canadian Yoshua Bengio, a bright MIT-trained computer scientist, joined him at Bell Labs and worked on the character recognition software, and later on machine vision technology that would be used by the NCR Corporation to automatically read a sizable proportion of all the bank checks circulating in the world.
Yet despite their success, for years the neural network devotees were largely ignored by the mainstream of academic computer science. Thinking of themselves as the “three musketeers,” Hinton, LeCun, and Bengio set out to change that. Beginning in 2004 they embarked on a “conspiracy”—in LeCun’s words—to boost the popularity of the networks, complete with a rebranding campaign offering more alluring concepts of the technology such as “deep learning” and “deep belief nets.” LeCun had by this time moved to New York University, partly for closer ties with neuroscientists and with researchers applying machine-learning algorithms to the problem of vision.
Hinton approached a Canadian foundation, the Canadian Institute for Advanced Research, for support to organize a research effort in the field and to hold several workshops each year. Known as the Neural Computation and Adaptive Perception project, it permitted him to handpick the most suitable researchers in the world across a range of fields stretching from neuroscience to electrical engineering. It helped crystallize a community of people interested in the neural network research.
Terry Sejnowski, Yann LeCun, and Geoffrey Hinton (from left to right), three scientists who helped revive artificial intelligence by developing biologically inspired neural network algorithms. (Photo courtesy of Yann LeCun)
This time they had something else going for them—the pace of computing power had accelerated, making it possible to build neural networks of vast scale, processing data sets orders of magnitude larger than before. It had taken almost a decade, but by then the progress, power, and value of the neural network techniques was indisputable. In addition to raw computing power, the other missing ingredient had been large data sets to use to train the networks. That would change rapidly with the emergence of the global Internet, making possible a new style of centralized computing power—cloud computing—as well as the possibility of connecting that capacity to billions of mobile sensing and computing systems in the form of smartphones. Now the neural networks could be easily trained on millions of digital images or speech samples readily available via the network.
As the success of their techniques became more apparent, Hinton began to receive invitations from different computer companies all looking for ways to increase the accuracy of a wide variety of consumer-oriented artificial intelligence services—speech recognition, machine vision and object recognition, face detection, translation and conversational systems. It seemed like the list was endless. As a consultant Hinton had introduced the deep learning neural net approach early on at Microsoft, and he was vindicated in 2012, when Microsoft’s head of research Richard Rashid gave a lecture in a vast auditorium in Tianjin, China. As the research executive spoke in English he paused after each sentence, which was then immediately translated by software into spoken Chinese in a simulation of his own voice. At the end of the talk, there was silence and then stunned applause from the audience.
The demonstration hadn’t been perfect, but by adding deep learning algorithm techniques the company had adopted from Hinton’s research, it had been able to reduce recognition errors by more than 30 percent. The following year a trickle of interest in neural networks turned into a torrent. The easy availability of Internet data sets and low-cost crowd-sourced labor provided both computing and human resources for training purposes.
Microsoft wasn’t alone. A variety of new neural net and other machine-learning techniques have led to a dramatic revival of interest in AI in Silicon Valley and elsewhere. Combining the new approach to AI with the Internet has meant that it is now possible to create a new service based on computer vision or speech recognition and then use the Internet and tens of millions of smartphone users to immediately reach a global audience.
In 2010 Sebastian Thrun had come to Google to start the Google X Laboratory, which was initially framed inside the company as Google’s version of Xerox’s Palo Alto Research Center. It had a broad portfolio of research projects, stretching from Thrun’s work in autonomous cars to efforts to scale up neural networks, loosely identified as “brain” projects, evoking a new wave of AI.
The Human Brain Project was initially led by Andrew Ng, who had been a colleague with Thrun at the resurrected Stanford Artificial Intelligence Laboratory. Ng was an expert in machine learning and adept in some of the deep learning neural network techniques that Hinton and LeCun had pioneered. In 2011, he began spending time at Google building a machine vision system and the following year it had matured to the point where Google researchers presented a paper on how the network performed in an unsupervised learning experiment using YouTube videos. Training itself on ten million digital images found on YouTube, it performed far better than any previous effort by roughly doubling accuracy in recognizing objects from a challenging list of twenty thousand distinct items. It also taught itself to recognize cats, which is not surprising since there is an overabundance of cat images on YouTube. The Google brain assembled a dreamlike digital image of a cat by employing a hierarchy of memory locations to successively cull out general features after being exposed to millions of images. The scientists described the mechanism as a cybernetic cousin to what takes place in the brain’s visual cortex. The experiment was made possible by Google’s immense computing resources that allowed the researchers to turn loose a cluster of sixteen thousand processors on the problem—which of course is still a tiny fraction of the brain’s billions of neurons, a huge portion of which are devoted to vision.
Whether or not Google is on the trail of a genuine artificial “brain” has become increasingly controversial. There is certainly no question that the deep learning techniques are paying off in a wealth of increasingly powerful AI achievements in vision and speech. And there remains in Silicon Valley a growing group of engineers and scientists who believe they are once again closing in on “Strong AI”—the creation of a self-aware machine with human or greater intelligence.
Ray Kurzweil, the artificial intelligence researcher and barnstorming advocate for technologically induced immortality, joined Google in 2013 to take over the brain work from Ng, shortly after publishing How to Create a Mind, a book that purported to offer a recipe for creating a working AI. Kurzweil, of course, has all along been one of the most eloquent backers of the idea of a singularity. Like Moravec, he posits a great acceleration of computing power that would lead to the emergence of autonomous superhuman machine intelligence, in Kurzweil’s case pegging the date to sometime around 2023. The idea became codified in Silicon Valley in the form of the Singularity University and the Singularity Institute, organizations that focused on dealing with the consequences of that exponential acceleration.
Joining Kurzweil are a diverse group of scientists and engineers who believe that once they have discovered the mechanism underlying the biological human neuron, it will be simply a matter of scaling it up to create an AI. Jeff Hawkins, a successful Silicon Valley engineer who had founded Palm Computing with Donna Dubinsky, coauthored On Intelligence in 2004, which argued that the path to human-level intelligence lay in emulating and scaling up neocortex-like circuits capable of pattern recognition. In 2005, Hawkins formed Numenta, one of a growing list of AI companies pursuing pattern recognition technologies. Hawkins’s theory has parallels with the claims that Kurzweil makes in How to Create a Mind, his 2012 effort to lay out a recipe for intelligence. Similar paths have been pursued by Dileep George, a Stanford-educated artificial intelligence researcher who originally worked with Hawkins at Numenta and then left to form his own company, Vicarious, with the goal of developing “the next generation of AI algorithms,” and Henry Markram, the Swiss researcher who has enticed the European Union into supporting his effort to build a detailed replica of the human brain with one billion euros in funding.
In 2013 a technology talent gold rush that was already under way reached startling levels. Hinton left for Google because the resources available in Mountain View dwarfed what he had access to at the University of Toronto. There is now vastly more computing power available than when Sejnowski and Hinton first developed the Boltzmann Machine approach to neural networks, and there is vastly more data to train the networks on. The challenge now is managing a neural network that might have one billion parameters. To a conventional statistician that’s a nightmare, but it has spawned a sprawling “big data” industry that does not shy away from monitoring and collecting virtually every aspect of human behavior, interaction, and thought.
After his arrival at Google, Hinton promptly published a significant breakthrough in making more powerful and efficient learning networks by discovering how to keep the parameters from effectively stepping on each other’s toes. Rather than have an entire network process the whole image simultaneously, in the new model a subset is chosen, a portion of the image is processed, and the weights of the connections are updated. Then another random set is picked and the image is processed again. It offers a way to use randomness to reinforce the influence of each subset. The insight might be biologically inspired, but it’s not a slavish copy. By Sejnowski’s account, Hinton is an example of an artificial intelligence researcher who pays attention to the biology but is not constrained by it.
In 2012 Hinton’s networks, trained on a huge farm of computers at Google, did remarkably well at recognizing individual objects, but they weren’t capable of “scene understanding.” For example, the networks could not recognize the sentence: “There is a cat sitting on the mat and there is a person dangling a toy at the cat.” The holy grail of computer vision requires what AI researchers call “semantic understanding”—the ability to interpret the scene in terms of human language. In the 1970s the challenge of scene understanding was strongly influenced by Noam Chomsky’s ideas about generative grammar as a context for objects and a structure for understanding their relation within a scene. But for decades the research went nowhere.
However, late in 2014, the neural network community began to make transformative progress in this domain as well. Around the country research groups reported progress in combining the learning properties of two different types of neural networks, one to recognize patterns in human language and the other to recognize patterns in digital images. Strikingly, they produced programs that could generate English-language sentences that described images at a high level of abstraction.44 The advance will help in applications that improve the results generated by Internet image search applications. The new approach also holds out the potential for creating a class of programs that can interact with humans with a more sophisticated level of understanding.
Deep learning nets have made significant advances, but for Hinton, the journey is only now beginning. He said recently that he sees himself as an explorer who has landed on a new continent and it’s all very interesting, but he has only progressed a hundred yards inland and it’s still looking very interesting—except for the mosquitoes. In the end, however, it’s a new continent and the researchers still have no idea what is really possible.
In late 2013, LeCun followed Hinton’s move from academia to industry. He agreed to set up and lead Facebook’s AI research laboratory in New York City. The move underscored the renewed corporate enthusiasm for artificial intelligence. The AI Winter was only the dimmest of memories. It was now clearly AI Spring.
Facebook’s move to join the AI gold rush was an odd affair. It began with a visit by Mark Zuckerberg, Facebook cofounder and chief executive, to an out-of-the-way technical conference called Neural Information Processing Systems, or NIPS, held in a Lake Tahoe hotel at the end of 2013. The meeting had long been a bone-dry academic event, but Zuckerberg’s appearance to answer questions was a clear bellwether. Not only were the researchers unused to appearances by high-visibility corporate tycoons, but Zuckerberg was accompanied by uniformed guards, lending the event a surreal quality. The celebrity CEO filled the room he was in and several other workshops were postponed as a video feed was piped into an overflow room. “The tone changed rapidly: accomplished professors became little more than lowly researchers shuffling into the Deep Learning workshop to see a Very Important Person speak,”45 blogged Alex Rubinsteyn, a machine-learning researcher who was an attendee at the NIPS meeting.
In the aftermath of the event there was an alarmed back-and-forth inside the tiny community of researchers about the impact of commercialization of AI on the culture of the academic research community. It was, however, too late to turn back. The field has moved on from the intellectual quarrels in the 1950s and 1960s over the feasibility of AI and the question of the correct path. Today, a series of probabilistic mathematical techniques have reinvented the field and transformed it from an academic curiosity into a force that is altering many aspects of the modern world.
It has also created an increasingly clear choice for designers. It is now possible to design humans into or out of the computerized systems that are being created to grow our food, transport us, manufacture our goods and services, and entertain us. It has become a philosophical and ethical choice, rather than simply a technical one. Indeed, the explosion of computing power and its accessibility everywhere via wireless networks has reframed with new urgency the question addressed so differently by McCarthy and Engelbart at the dawn of the computing age.
In the future will important decisions be made by humans or by the deep learning-style algorithms? Today, the computing world is demarcated between those who focus on creating intelligent machines and those who focus on how human capabilities can be extended by the same machines. Will it surprise anyone that the differing futures emerging from those opposing stances must be very different worlds?