Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots - John Markoff (2015)
Chapter 8. “ONE LAST THING”
Set on the Pacific Ocean a little more than an hour’s drive south of San Francisco, Santa Cruz exudes a Northern California sensibility. The city blends the Bohemian flavor of a college town with the tech-savvy spillover from Silicon Valley just over the hill. Its proximity to the heart of the computing universe and its deep countercultural roots are distinct counterpoints to the tilt-up office and manufacturing buildings that are sprinkled north from San Jose on the other side of the mountains. Geographically and culturally, Santa Cruz is about as far away from the Homestead-Miami Speedway as you can get.
It was a foggy Saturday morning in this eclectic beach town, just months after the Boston Dynamics galloping robots stole the show at the steamy Florida racetrack. Bundled against the morning chill, Tom Gruber and his friend Rhia Gowen wandered into The 418 Project, a storefront dance studio that backs up against the river. They were among the first to arrive. Gruber is a wiry salt-and-pepper-goateed software designer and Gowen is a dance instructor. Before returning to the United States several years ago, she spent two decades in Japan, where she directed a Butoh dance theater company.
Tom Gruber began his career as an artificial intelligence researcher who swung from AI to work on augmenting human intelligence. He was a cofounder of the team of programmers who designed Siri, Apple’s iPhone personal assistant. (Photo © 2015 by Tom Gruber)
In Santa Cruz, Gowen teaches a style of dance known as Contact Improvisation, in which partners stay physically in touch with each other while moving in concert with a wide range of musical styles. To the untrained eye, “Contact Improv” appears to be part dance, part gymnastics, a bit of tumbling, and even part wrestling. Dancers use their bodies in a way that provides a sturdy platform for their partners, who may roll over and even bounce off them in sync with the music. The Saturday-morning session that Gruber and Gowen attended was even more eclectic: it was a morning weekend ritual for the Santa Cruz Ecstatic Dance Community. Some basic rules are spelled out at ecstaticdance.org:
1.Move however you wish;
2.No talking on the dance floor;
3.Respect yourself and one another.
There is also an etiquette that requires that partners be “sensitive” if they want to dance with someone and that offers a way out if they don’t: “If you’d rather not dance with someone, or are ending a dance with someone, simply thank them by placing your hands in prayer at your heart.”
The music mix that morning moved from meditative jazz to country, rock, and then to a cascade of electronic music styles. The room gradually filled with people, and the dancers each entered a personal zone. Some danced together, some traded partners, some swayed to an inner rhythm. It was free-form dance evocative of a New Age gym class.
Gruber and Gowen wove through the throng. Sometimes they were in contact, and sometimes they broke off to dance with other partners, then returned. He picked her up and bent down and let her roll across his back. It wasn’t exactly “do-si-do your partner,” but if the move was done well, one body formed a platform that shouldered the other partner’s weight without strain. Gruber was a confident dancer and comfortable with moves that evoked a modern dance sensibility. It offered a marked contrast to the style of many of the more hippie, middle-aged Californians, who were skipping and waving in all directions against a quickening beat. The pace of the dancers ascended to a frenzy and then backed down to a mellower groove. Gradually, the dancers melted away from the dance floor. Gruber and Gowen donned their jackets and stepped out into the still-foggy morning air.
Gruber casually pulled an iPhone from his pocket and asked Siri, the software personal assistant he designed, a simple question about his next stop. On Monday he would be back in the fluorescent-lit hallways of Apple, amid endless offices overloaded with flat-panel displays. On that morning, however, he wandered in a more human-centric world, where computers had disappeared and everyday devices like phones were magical.
Apple’s corporate campus is circumscribed by Infinite Loop, a balloon-shaped street set just off the Interstate 280 freeway in Cupertino. The road wraps in a protective circle around a modern cluster of six office buildings facing inward onto a grassy courtyard. It circles a corporate headquarters that reflects Apple’s secretive style. The campus was built during the era in which John Sculley ran the company. When originally completed, it served as a research and development center, but as Apple scaled down after Sculley left in 1993, it became a fortress for an increasingly besieged company. When Steve Jobs returned, first as “iCEO” in 1997, there were many noticeable changes including a dramatic improvement in the cafeteria food. The fine silver that had marked the executive suite during the brief era when semiconductor chief Gilbert Amelio ran the company also disappeared.
As his health declined during a battle with pancreatic cancer in 2011, Steve Jobs came back for one last chapter at Apple. He had taken his third medical leave, but he was still the guiding force at the company. He had stopped driving and so he would come to Apple’s corporate headquarters with the aid of a chauffeur. He was bone-thin and in meetings he would mention his health problems, although never directly acknowledging the battle was with cancer. He sipped 7UP, which hinted to others that he might have been struggling through chemotherapy.
The previous spring Jobs had acquired Siri, a tiny developer of a natural language software application that was designed to act as a virtual assistant, in effect a software assistant, on the iPhone. The acquisition had drawn a great deal of attention in Silicon Valley. Apple acquisitions, particularly large ones, are extremely rare. When word circulated that the firm had been acquired, possibly for more than $200 million, it sent shock waves up and down Sand Hill Road and within the burgeoning “app economy” that the iPhone had spawned. After Apple acquired Siri, the program was immediately pulled from the App Store, the iPhone service through which programs were screened and sold, and the small team of programmers who had designed Siri vanished back into “stealth mode” inside the Cupertino campus. The larger implications of the acquisition weren’t immediately obvious to many in the Valley, but as one of his last acts as the leader of Apple, Steve Jobs had paved the way for yet another dramatic shift in the way humans would interact with computers. He had come down squarely on the side of those who placed humans in control of their computing systems.
Jobs had made a vital earlier contribution to the computing world by championing the graphical desktop computing approach as a more powerful way to operate a PC. The shift from the command line interface of the IBM DOS era to the desktop metaphor of the Macintosh had opened the way for the personal computer to be broadly adopted by students, designers, and office workers—a computer for “the rest of us,” in Apple parlance. Steve Jobs’s visits to PARC are the stuff of legend. With the giant copier company’s blessing and a small but lucrative Xerox investment in Apple pre-IPO, he visited several times in 1979 and then over the next half decade created first the Lisa and then the Macintosh.
But the PC era was already giving way to a second Xerox PARC concept—ubiquitous computing. Mark Weiser, the PARC computer scientist, had conceived the idea during the late 1980s. Although he had been given less credit for the insight and the shift, Jobs had been the first to successfully translate Weiser’s ideas for general consumer audiences. The iPod and then the iPhone were truly ubiquitous computing devices. Jobs first transformed the phonograph and then the telephone by adding computing. “A thousand songs in your pocket” and “something wonderful for your hand.” He was the consummate showman, and “one more thing” had become a trademark slogan that Jobs used at product introductions, just before announcing something “insanely great.” For Jobs, however, Siri was genuinely his “one last thing.” By acquiring Siri he took his final bow for reshaping the computing world. He bridged the gap between Alan Kay’s Dynabook and the Knowledge Navigator, the elaborate Apple promotional video imagining a virtual personal assistant. The philosophical distance between AI and IA had resulted in two separate fields that rarely spoke. Even today, in most universities artificial intelligence and human-computer interaction remain entirely separate disciplines. In a design approach that resonated with Lee Felsenstein’s original golemics vision, Siri would become a software robot—equipped with a sense of humor—intended to serve as a partner, not a slave.
It was an extraordinary demand that only Steve Jobs would have considered. He directed his phone designers to take a group of unknown software developers, who had never seen any of Apple’s basic operating system software, and allow them to place their code right at the heart of the iPhone. He then forced his designers to create connections to all of the iPhone’s application programs from the ground up. And he ordered that it all happen in less than a year. To supplement the initial core of twenty-four people who had arrived with the Siri acquisition, the programmers borrowed and begged from various corners of Apple’s software development organization. But it wasn’t enough. In most technical companies a demand of this scale would be flatly rejected as impossible. Jobs simply said, “Make it happen.”
Tom Gruber was a college student studying psychology in the late 1970s when he stumbled upon artificial intelligence. Wandering through his school library, he found a paper describing the work of Raj Reddy and a group of Carnegie Mellon University computer scientists who had built a speech recognition system called Hearsay-II. The program was capable of recognizing just a thousand words spoken in sentences with a 90 percent accuracy rate. One error every ten words, of course, was not usable. What struck Gruber, though, was that the Hearsay system married acoustic signal processing with more general artificial intelligence techniques. He immediately realized that the system implied a model of the brain that was required to represent human knowledge. He realized that psychologists were also modeling this process, but poorly. At that point in the 1980s, there were no PET scans or fMRI brain-imaging systems. Psychologists were studying human behavior, but not the brain itself.
Not long after reading about the Hearsay research, Gruber found the early work of Edward Feigenbaum, a Stanford University computer science professor who focused on the idea of building “expert systems” to capture human knowledge and replicate the capabilities of specialists in highly technical fields. While he was a graduate student at Carnegie Mellon working with Herbert Simon, Feigenbaum had done research in designing computer models of human memory. The Elementary Perceiver and Memorizer, or EPAM, was a psychological theory of human learning and memory that researchers could integrate into a computer program.
Feigenbaum’s work inspired Gruber to think more generally about building models of the mind. At this point, however, he hadn’t considered applying to graduate school. No one in his family had studied for an advanced degree and the idea wasn’t on his radar. By the time he finally sent out applications, there were only a few places that would still offer him funding. Both Stanford and MIT notified Gruber that his application was about three months late for the upcoming school year, and they invited him to apply again in the future. Luckily, he was accepted by the University of Massachusetts, which at the time was home to a vibrant AI group that was researching work in robotics including how to program robotic hands. The program’s academic approach to robotics explicitly melded artificial intelligence and cognitive science, which spoke perfectly to his interest in modeling the human mind.
For Gruber, AI turned out to be the fun part of computer science. It was philosophically rich and scientifically interesting, offering ideas about psychology and the function of the human mind. In his view, the rest of computer science was really just engineering. When he arrived in Massachusetts in 1981, he worked with Paul Cohen, a young computer scientist who had been a student of Feigenbaum’s at Stanford and shared Gruber’s interest in AI and psychology. Paul Cohen’s father, Harold Cohen, was a well-known artist who had worked at the intersection of art and artificial intelligence. He had designed the computer program Aaron and used it to paint and sell artistic images. The program didn’t create an artistic style, but it was capable of generating an infinite series of complex images based on parameters set by Cohen. Aaron proved to be a powerful environment for pondering philosophical questions about autonomy and creativity.
Gruber had mentioned to the computer science department chairman that he wanted to have a social impact in his career and so he was directed to a project designing systems that would allow people with severely crippling conditions like cerebral palsy to communicate. Many of those with the worst cases couldn’t speak and, at the time, used a writing system called Bliss Boards that allowed them to spell words by pointing at letters. This was a painstaking and limiting process. The system that Gruber helped develop was an early version of what researchers now call “semantic autocomplete.” The researchers worked with children who could understand language clearly but had difficulty speaking. They organized the interaction scheme so the system anticipated what a participant might say next. The challenge was to create a system to communicate things like “I want a hamburger for lunch.”
It was a microcosm of the entire AI world at the time. There was no big data; researchers could do little more than build a small model of the child’s world. After working on this project for a while, Gruber built a software program to simulate that world. He made it possible for the caregivers and parents to add sentences to the program that personalized the system for a particular child. Gruber’s program was an example of what the AI community would come to call “knowledge-based systems,” programs that would reason about complex problems using rules and a database of information. The idea was to create a program that would be able to act like a human expert such as a doctor, lawyer, or engineer. Gruber, however, quickly realized that acquiring this complex human knowledge would be difficult and made this problem the subject of his doctoral dissertation.
Gruber was a skilled computer hacker, and many faculty members wanted to employ him to do their grunt work. Instead, he moonlighted at Digital Equipment Corporation, the minicomputer manufacturer. He was involved in a number of projects for DEC, including the development of an early windowing system that was written in McCarthy’s AI programming language Lisp. The fact that the program ran well surprised many software developers because Lisp was not intended for graphical applications where you needed blinding speed. It took Gruber a month during the summer to write the program. It was much more common for developers to write these kinds of applications in assembly language or C in order to save time, but it turned out that for Gruber, Lisp was efficient enough. To show off the power of the Lisp programming language, he built a demo of an automated “clipping service” for visitors from the NSA. The program featured an interactive interface that allowed a computer user to tailor a search, then save it in a permanent alert system that would allow the filtering of that information. The idea stuck with him, and he would reuse it years later when he founded his first company.
Focused on getting a Ph.D. and still intrigued by science-of-mind questions, he avoided going to work for the then booming DEC. Graduate school was nirvana. He rode his bike frequently in Western Massachusetts and was able to telecommute, making more than thirty dollars an hour from his home terminal. He spent his summers in Cambridge because it was a lively place to be, working in Digital’s laboratory. He also became part of a small community of AI researchers who were struggling to build software systems that approximated human expertise. The group met annually in Banff. AI researchers quickly realized that some models of human reasoning defied conventional logic. For example, engineering design is made up of a set of widely divergent activities. An HVAC—heating, ventilation, and air-conditioning—system designer might closely follow a set of rules and constraints with few exceptions. In optics, precise requirements make it possible to write a program that would design the perfect glass. Then there is messy design, product design, for example, where there are no obvious right answers and a million questions about what is required and what is optional. In this case the possible set of answers is immense and there is no easy way to capture the talent of a skilled designer in software.
Gruber discovered early in his research why conventional expert systems models failed: human expertise isn’t reducible to discrete ideas or practices. He had begun by building small models, like a tool for minimizing the application of pesticides on a tree farm. Separately, he worked with cardiologists to build a diagnostic system that modeled how they used their expertise. Both were efforts to capture human expertise in software. Very simple models might work, but the complexity of real-world expertise was not easily reducible to a set of rules. The doctors had spent decades practicing medicine, and Gruber soon realized that attempting to reduce what they did to “symptoms and signs” was impossible. A physician might ask patients about what kind of pain they were experiencing, order a test, and then prescribe nitroglycerin and send them home. Medicine could be both diagnostic and therapeutic. What Gruber was seeing was a higher-level strategy being played out by the human experts, far above the rote actions of what was then possible with relatively inflexible expert system programs.
He soon realized that he wasn’t interested in building better expert systems. He wanted to build better tools to make it easier for people to design better expert systems. This was to become known as the “knowledge acquisition problem.” In his dissertation he made the case that researchers did not need to model knowledge itself but rather strategy—that is, knowledge about what to do next—in order to build a useful expert system. At the time, expert systems broke easily, were built manually, and required experts to compile the knowledge. His goal was to design a way to automate the acquisition of this elusive “strategic knowledge.”
As a graduate student his approach was within the existing AI community framework: At the outset he defined artificial intelligence conventionally, as being about understanding intelligence and performing human-level tasks. Over time, his perspective changed. Not only should AI imitate human intelligence; he came to believe it should aim to amplify that intelligence as well. He hadn’t met Engelbart and he wasn’t familiar with his ideas, but using computing to extend, rather than simulate or replace, humans would become a motivating concept in his research.
While he was still working on his dissertation he decided to make the leap to the West Coast. Stanford was the established center for artificial intelligence research and Ed Feigenbaum, then a rising star in the AI world, was working there. He had launched a project to build the world’s largest expert system on “engineering knowledge,” or how things like rocket ships and jet engines were designed and manufactured. Gruber’s advisor Paul Cohen introduced him to Feigenbaum, who politely told him that his laboratory was on soft money and he just didn’t have any slots for new employees.
“What if I raise my own money?” Gruber responded.
“Bring your own money?!”
Feigenbaum agreed, and Gruber obtained support from some of the companies he had consulted for. Before long, he was managing Feigenbaum’s knowledge engineering project. In 1989, Gruber thus found himself at Stanford University during the personal computing boom and the simultaneous precipitous decline of the AI field in the second AI Winter. At Stanford, Gruber was insulated from the commercial turmoil. Once he started on Feigenbaum’s project, however, he realized that he was still faced with the problem of how to acquire the knowledge necessary to simulate a human expert. It was the same stumbling block he had tried to solve in his dissertation. That realization quickly led to a second: to transition from “building” to “manufacturing” knowledge systems, developers needed standard parts. He became part of an effort to standardize languages and categories used in the development of artificial intelligence. Language must be used precisely if developers want to build systems in which many people and programs communicate. The modules would fail if they didn’t have standardized definitions. The AI researchers borrowed the term “ontology,” which was the philosophical term for the study of being, using it in a restricted fashion to refer to the set of concepts—events, items, or relations—that constituted knowledge in some specific area. He made the case that an ontology was a “treaty,” a social agreement among people interested in sharing information or conducting commerce.
It was a technology that resonated perfectly with the then new Internet. All of a sudden a confused world of multiple languages and computer protocols were all connected in an electronic Tower of Babel. When the World Wide Web first emerged, it offered a universal mechanism for easily retrieving documents via the Internet. The Web was loosely based on the earlier work of Doug Engelbart and Ted Nelson in the 1960s, who had independently pioneered the idea of hypertext linking, making it possible to easily access information stored in computer networks. The Web rapidly became a medium for connecting anyone to anything in the 1990s, offering a Lego-like way to link information, computers, and people.
Ontologies offered a more powerful way to exchange any kind of information by combining the power of a global digital library with the ability to label information “objects.” This made it possible to add semantics, or meaning, to the exchange of electronic information, effectively a step in the direction of artificial intelligence. Initially, however, ontologies were the province of a small subset of the AI community. Gruber was one of the first developers to apply engineering principles to building ontologies. Focusing on that engineering effort drew him into collaborative work with a range of other programmers, some of whom worked across campus and others a world away. He met Jay “Marty” Tenenbaum, a computer scientist who had previously led research efforts in artificial intelligence at SRI International and who at the time directed an early Silicon Valley AI lab set up by the French oil exploration giant Schlumberger. Tenenbaum had an early and broad vision about the future of electronic commerce, preceding the World Wide Web. In 1992 he founded Enterprise Integration Technologies (EIT), a pioneer in commercial Internet commerce transactions, at a time when the idea of “electronic commerce” was still largely unknown.
From an office near the site where the Valley’s first chipmaker, Fairchild Semiconductor, once stood, Tenenbaum sketched out a model of “friction free” electronic commerce. He foresaw a Lego-style automated economy in which entire industries would be woven together by computer networks and software systems that automated the interchange of goods and services. Gruber’s ontology work was an obvious match for Tenenbaum’s commerce system because it was a system that required using a common language to connect disparate parts. Partly as a result of their collaboration, Gruber was one of the first Silicon Valley technologists to immerse himself in the World Wide Web. Developed by Tim Berners-Lee in the heart of the particle physics community in Switzerland, the Web was rapidly adopted by computer scientists. It became known to a much wider audience when it was described in the New York Times in December of 1993.1
The Internet allowed Gruber to create a small group that blossomed into a living cyber-community expressed in the exchange of electronic mail. Even though few of the participants had face-to-face contact, they were in fact a “virtual” organization. The shortcoming was that all of their communications were point-to-point and there was no single shared copy of the group electronic conversation. “Why don’t I try to build a living memory of all of our exchanges?” Gruber thought. His idea was to create a public, retrievable, permanent group memory. Today, with online conferences, support systems, and Google, the idea seems trivial, but at the time it was a breakthrough. It had been at the heart of Doug Engelbart’s original NLS system, but as the personal computer had emerged, much of Engelbart’s broader vision had been sidelined as first Xerox PARC and then Apple and Microsoft had cherry-picked his ideas, like the mouse and hypertext, while ignoring his broader mission for an intelligence augmentation system that would facilitate small groups of knowledge workers. Gruber created a software program that automatically generated a living document of the work done by a group of people. Over a couple of weeks he sat down and built a program named Hypermail that would “live” on the same computer that was running a mail server and would generate a threaded copy of an email conversation that could be retrieved from the Web. What emerged was a digital snapshot of the email conversation complete with permanent links that could be bookmarked and archived.
The emergence of the World Wide Web was a life-changing event for Gruber. He was now thirty years old and working at Stanford, and he quickly realized that the Web was a much bigger idea than anything he had worked on previously. He recognized that Tenenbaum was onto something with dramatic potential to change the way people used computers. Tenenbaum had hired a young programmer named Kevin Hughes who had come to the project from Hawaii Community College. Hughes was representative of a new class of programmer that was emerging from a generation who had grown up with computing. He didn’t look like he was old enough to drive, but he called himself a “webmaster.” Gruber had initially written Hypermail in his favorite programming language, Lisp, and shared it through the software channels that were popular at the time. To Hughes, that approach was dated. He told Gruber that Hypermail had to be rewritten in C and that it had to be given away freely. Gruber convinced Tenenbaum, and then took a weekend to rewrite the program in C. Hughes was right. Once it was freely available on the Web, its use exploded.
It was a major step in bringing the Engelbart-Nelson hypertext vision to life. Overnight, anyone who was running a list server on a Unix computer could drop the program on their computer and their electronic conversations would be instantly available to the broader Internet. It was a powerful lesson for Gruber about how the Internet could be used to leverage a simple idea. EIT was purchased by VeriFone early in 1995, just at the outset of the dot-com era. Two years later VeriFone, known for its point-of-sale terminals, was itself purchased by HP during the run-up of the first Internet bubble, only to be cast out again after the bubble burst. Gruber had left Stanford to join EIT in 1994 but left before EIT was sold the first time to pursue his own ideas. Why stop with email, he wondered? He set out to build on a large chunk of Engelbart’s vision and sell it to corporate America.
In the early 1990s, Engelbart’s ideas enjoyed a renaissance at Stanford. The four years Gruber had spent at the university trying to create Feigenbaum’s engineering knowledge system from piles of statements of rules and assertions and ontologies hadn’t succeeded. In his Hypermail project, Gruber saw a way to build a valuable commercial knowledge system and, in the entrepreneurial heat of the dot-com explosion, set out to create his own company to do so. Berners-Lee had made the original breakthrough when he designed the World Wide Web. It was not just that he had created a working version of the Engelbart-Nelson hypertext system. He had established a system of permanent identifiers for bundles of information that the engineers described as “knowledge objects.” That changed everything. It allowed Web developers to create persistent knowledge structures that functioned as usable digital libraries, upon which it was possible to build both artificial intelligence and augmentation systems.
Gruber’s idea was to build a “corporate memory,” a system that would weave together all the documents that made a modern organization function, making them easy to structure and recall. It was reminiscent of Engelbart’s original oN-Line System, but was modernized to take advantage of the power of Berner-Lee’s invention. Lotus Notes had been an earlier effort by Ray Ozzie, then a young software designer working on a contract basis for Mitch Kapor at Lotus, but it was stuck in the proprietary world of corporate enterprise software. Now the Internet and new Web standards made it possible to build something with far greater scope.
With another AI researcher, Peter Friedland, and former DARPA program manager Craig Wier, Gruber founded Intraspect in 1996 in Los Altos and became the chief technology officer. In the beginning he worked with one programmer, who had a day job at Stanford. The programmer arrived in the evening after Gruber had worked on the prototype during the day and took over and continued development into the night. As Gruber was leaving at the end of the day, he discussed what he had done and what needed to be completed. They would iterate—it was an effective way to rapidly build a prototype.
The company eventually raised more than $60 million in venture funding and would have as many as 220 employees and a polished product. They were able to quickly build a base of blue-chip companies as well, including GTE, General Motors, KPMG, Boeing, and Siemens. The PC era had transformed the corporation and companies were being run with electronic mail rather than with printed interoffice memos. This shift made it possible to create an inexpensive system that simply “swallowed” every communication that was CC’ed to a special email address. No webmaster or heavy IT presence was necessary. The Intraspect system ingested corporate communications and documents and made them instantly accessible to anyone in a company with access to a personal computer. Desktop folder icons were still the common metaphor for organized documents, and so the Intraspect engineers built a Windows program based on a folder-oriented interface.
In Gruber’s mind this was what the future of AI should be. What had started as an effort to model a human brain would shift in focus and end up as an endeavor to model the interactions of a human group. In a sense, this distinction was at the heart of the cultural divide between the two research communities. The AI community began by trying to model isolated human intelligence while the emerging community of human-computer interaction designers followed in Engelbart’s augmentation tradition. He had begun by designing a computer system that enhanced the capabilities of small groups of people who collaborated. Now Gruber had firmly aligned himself with the IA community. At the Stanford Knowledge Systems Laboratory, he had interviewed avionics designers and took their insights to heart. There had been an entire era of industrial design during which designers assumed that people would adapt to the machine. Designers originally believed that the machine was the center of the universe and the people who used the machines were peripheral actors. Aircraft designers had learned the hard way that until they considered the human-machine interaction as a single system, they built control systems that led to aircraft crashes. It simply wasn’t possible to account for all accidents by blaming pilot error. Aircraft cockpit design changed, however, when designers realized that the pilot was part of the system. Variables like attention span and cognitive load, which had been pioneered and popularized by psychologists, became an integral part first in avionics and, more recently, computer system design.
Gruber thought hard about these issues while he designed the Intraspect query system. He imagined customers, often corporate salespeople, as aircraft pilots and tried to shape the program to avoid deluging them with information. Intraspect demonstrated the system to a J.P. Morgan executive. Gruber performed a simple search and the executive saw relevant recent communications between key employees at the firm with relevant documents attached. His jaw dropped. He could literally see what his company knew. The Intraspect system used a search engine that was engineered to prioritize both the most recent and most relevant documents, which was something that was not yet widely offered by the first generation of Internet search engines.
At the peak of the dot-com era, Intraspect was doing spectacularly well. They had blue-chip customers and footholds in major industries like financial services and a run rate of $30 to $40 million in revenue. They had even written an S-1 in preparation for going public and had moved into a large new building and a prominent logo that was visible from the 101 freeway. Then everything collapsed. Although Intraspect survived the dot-com crash, the meltdown crippled some of its best customers.
September 11, 2001, followed. Overnight, everything changed. By the following March, CFOs at major companies were simply prohibiting the purchase of any product or service that wasn’t from a public company. The wave of fear nailed Intraspect. Gruber had spent six years building the company and at first he refused to believe that it was over. They had such strong customers and such a versatile product that they were convinced there must be a way to survive. But the company had been riding on its ability to leverage professional service firms like the Big Five accounting companies to sell its product, and those sales channels dried up during the crash.
Gruber was forced to lay off 60 percent of his company to stay afloat. In the end, Intraspect died with a whimper. Although it had advantages over its competitors, the entire market for collaborative corporate software collapsed. Portal companies, document management companies, search companies, and knowledge management companies all merged into one another. In 2003 Intraspect was sold for a fire sale price to Vignette and that was the end.
Gruber stayed at Vignette for a couple of months and then took a year off to recharge and think about what he would do next. He traveled to Thailand, where he went scuba diving and took pictures. He discovered Burning Man, the annual weeklong gathering in the Nevada desert that attracted tens of thousands of the Valley’s digerati. When Gruber’s sabbatical year ended he was ready to build a new company.
He knew Reid Hoffman, who had by then started LinkedIn, the business networking company. Because of his experience at Intraspect, Gruber had good insights into “social software.” The two men had a long series of conversations about Gruber joining the start-up, which was on track to become one of Silicon Valley’s early social networking success stories. Gruber wanted to focus on design issues and Hoffman was looking for a new CTO, but in the end the LinkedIn board vetoed the idea because the company was on the verge of a large investment round.
Gruber’s year of traveling had left him thinking about the intersection of travel and “collective intelligence” that was coming to life with emergence of “Web 2.0.” The Internet had not only made it possible to create corporate memories, but now crowdsourcing had become trivial for any human endeavor. Google, of course, was the most spectacular example. The company’s PageRank search algorithm exploited human preferences to rank Internet search query results. Through Reid Hoffman, Gruber found a start-up that was planning to compete with TripAdvisor, which at that point was only offering travelers’ reviews of hotels. He convinced them that he could bring them a big audience—they just needed to handle the business development side of the project. And so Gruber started over as the vice president of design at this new start-up, although this time he had a team of three engineers instead of sixty. Having a small army of programmers, however, was no longer critical to the success of a company—the Internet had changed everything. Even the smallest start-ups could leverage vastly more powerful development toolkits.
The start-up planned to collect the best trip descriptions that global travelers had to offer. It took them a year to build the service and they unveiled realtravel.com at the 2006 O’Reilly Web 2.0 Conference—an Internet event that had rapidly become the conference of choice for the next wave of so-called social start-ups. Realtravel.com grew fast—it even boasted a couple million unique visitors at one point—but it didn’t grow quickly enough, and the company was sold in 2008, just two years after receiving its seed funding. Gruber had left the company before it was sold, having feuded with the CEO—who was color-blind—over issues like what were the best colors on the site’s Web pages.
He took another year off. He had worked in various positions at realtravel.com—from writing code to overseeing design—and he needed the time away. When he returned, he used his Silicon Valley network of contacts to look for interesting projects. He was a member of an informal group called the CTO Club, which met regularly, and someone there mentioned a new project at SRI.
The research center had been showered with funding by DARPA under Tony Tether, who had taken a deep interest in building a software personal assistant. For five years, between 2003 and 2008, the Pentagon agency invested heavily in the idea of a “cognitive assistant.” The project would ultimately bring together more than three hundred researchers at twenty-five universities and corporate research laboratories, with SRI playing the role of the integrator for the project. The cognitive assistant, CALO, was in DARPA’s tradition of funding blue-sky research that had already created entire industries in Silicon Valley. Workstations, networking, and personal computing had all started as DARPA research projects.
The term “CALO” was inspired by calonis, a Latin word meaning “soldier’s low servant,” or clumsy drudge, but the project also had a significant overlap with Engelbart’s original work that was funded by DARPA in the sixties and seventies. CALO was intended to help an office worker with project management: it would organize workers’ email, calendars, documents, communication, schedules, and task management. Eventually, there were a number of commercial spin-offs from the CALO project—a smart calendar, a personalized travel guide, and a game development and education company—but they all paled compared to the success of Siri.
Long before the Maker Movement—the Silicon Valley subculture extolling an inclusive do-it-yourself approach to technology—gained steam, Gruber’s Siri cofounder Adam Cheyer was drafted into that world by his mother. As a child in a Boston suburb, he was restricted to just an hour of television each week, which offered him a brief glimpse of technology that whet his appetite for the latest toys. When he asked his mother to buy toys for him, however, she responded by giving him a stack of cardboard inserts that cleaners used to stiffen shirts. He resorted to tape, glue, and scissors to re-create the toys that he had seen on television, like robots and Rube Goldberg contraptions. It taught Cheyer that with a small dose of imagination, he could make anything he wanted.2
As a child he dreamed of becoming a magician. He had read books about the great magicians and thought of them as inventors and tinkerers who tricked others by using technology. Before he was ten he was saving his money to buy books and tricks from the local magic store. Later, he realized that his interest in artificial intelligence was rooted in his love of magic. His favorite eighteenth-century magicians and clockmakers led by Jacques de Vaucanson had built early automata: chess-playing and speaking machines and other mechanical humanoid robots that attempted to illuminate the inner workings of what he, like Gruber, would come to see as the most magical device of all—the human brain.3
Although Cheyer knew nothing of Engelbart’s legendary NLS, in 1987 he built his own system called HyperDoc while working as an artificial intelligence researcher with Bull, the aerospace firm, in France. He integrated a documentation system into the editor the programmers were using to design their expert systems. That update made it possible to simply click on any function or command to view a related online manual. Having easy access to the software documentation made it simpler for developers to program the computers and reduce the number of bugs. At the time, however, he was unfamiliar with the history of Doug Engelbart’s Augmentation Research Center in Menlo Park during the 1960s and 1970s. He had moved to California to get a master’s degree in computer science, with a plan to move back to France after graduation. It had been a fun sojourn in California, but the French computer firm would pay for his schooling only if he returned to Europe.
Not long before he was scheduled to return, however, he stumbled across a small blurb advertising a job in an artificial intelligence research laboratory at SRI. The job sounded intriguing and he decided to apply. Before flying to the Bay Area for the interview, he read extensively on the work of all of the researchers in the group. Between interviews he went into the bathroom to scan his notes in preparation for each appointment. When he arrived, he knew everything that everyone had worked on, who they worked with, and what their views were on different issues. His research paid off. He was hired in the SRI Artificial Intelligence Center.
In the early 1990s, despite the AI Winter, SRI remained a thriving hub for commercial, military, and academic artificial intelligence research, and decades after Shakey, robots were still roaming the halls. When Cheyer arrived at the laboratory, he received a small research grant from a Korean telecom lab run by the South Korean government. The project funding was for a pen and voice control system for the office environment. “Build us one of those,” they instructed him.
He decided to build a system that would make it easy to plug in additional capabilities in the future. The system was named Open Agent Architecture, or OAA. It was designed to facilitate what Cheyer thought of as “delegated computing.” For example, if a computer needed to answer a question like, “What’s Bob’s email address?” there was a range of ways that it could hunt for the answer. Cheyer created a language that would make it possible for a virtual software assistant to interpret the task and hunt for the answer efficiently.
In designing his framework, he found that he was at the heart of a debate that was raging between artificial intelligence researchers and the rival human-computer interaction community. One group believed the user needed to be in complete control of the computer and the other group envisioned software agents that could “live” in computer networks and operate on behalf of human users. From the beginning Cheyer had a nuanced view of the ideal human-machine relationship. He thought that humans sometimes like to control systems directly, but often they just want the system to do something on their behalf without bothering them with the details. To that end, his language made it possible to separate what the user wanted the system to do or find from how the task would be accomplished.
Within a year of arriving at SRI, Cheyer was focused on the challenge of actually building a working version of the Knowledge Navigator software avatar that John Sculley had extolled in a futuristic video in 1987. Like Alan Kay, who started out by building “interim” Dynabooks, during the next two decades Cheyer repeatedly developed prototypes, each of which more closely approximated the capabilities of the Knowledge Navigator. He was building software virtual robots, software assistants that were intended to act as much as partners as slaves.
By the end of 1993 he had designed a tablet PC resembling an iPad. No one had developed a touch interface yet and so Cheyer had integrated pen input into his tablet, which allowed it to recognize both handwriting and user gestures, like drawing circles around certain objects to select them. It also had the ability to recognize speech, largely because Cheyer had become adept at the technology equivalent of borrowing a cup of sugar from his neighbors down the hall. He had persuaded the researchers at SRI’s Speech Technology and Research Laboratory to install a software connector—known as an API—for his tablet. That allowed him to plug the mainframe-based speech recognition system into his system. SRI’s speech technology—which was a research activity that had started with Shakey—would be spun out the next year as a separate start-up, Nuance Communications, which initially pioneered voice applications for call centers. He did the same with SRI handwriting recognition technologies. He built a demonstration system that used voice and pen input to approximate a software secretary. It automated calendar tasks and handled email, contact lists, and databases, and he started experimenting with virtual assistance tasks, like using maps to find restaurants and movie theaters.
Cheyer walked the halls and sampled the different projects at the laboratory, like natural language understanding, speech recognition, cooperating robots, and machine vision. SRI was his playground and he used it to mash together a remarkably disparate and rich set of computing systems and services—and he did it all before he saw his first Web browser. The World Wide Web was just beginning to filter out into the world. When the NCSA Mosaic browser, the first popular browser that brought the Web to the general public, finally did arrive, it felt like déjà vu.
Cheyer wanted to create an assistant that could provide a computer user with the kind of help he or she might expect to get from an attentive secretary. Although he had started on his own, over the next six years he worked with a small team of programmers and designers and created more than four dozen applications, ranging from intelligent refrigerators that would find recipes and restock themselves to televisions that let you control your home, collaborative robots, and intelligent offices. Ultimately the team would have a significant impact on mobile computing. Fifteen years later, two members of his early research group were key technology executives overseeing the design of the Samsung Galaxy smartphone and three had gone on to Apple to deliver Siri.
Cheyer quietly earned a reputation inside SRI as the “next Engelbart.” Eventually he became so passionate about Engelbart’s ideas that he kept a photo of the legendary computer scientist on his desk to remind him of his principles. By the end of the 1990s Cheyer was ready for a new challenge. The dot-com era was in full swing and he decided to commercialize his ideas. The business-to-business Internet was exploding and everywhere there were services that needed to be interconnected. His research was a perfect fit for the newly popular idea of loosely coupled control. In a world of networked computers, software that allowed them to cooperate was just beginning to be designed. He was following a similar path to Marty Tenenbaum’s, the AI researcher who had created CommerceNet, the company for which Tom Gruber built ontologies.
One of a small group of Silicon Valley researchers who realized early on that the Internet would become the glue that connected all commerce, Cheyer went to a competitor called VerticalNet, where he created a research lab and was soon made VP of engineering. Like Gruber, he was caught up in the dot-com maelstrom. At one point VerticalNet’s market value soared to $12 billion on revenues of a little more than $112 million. Of course it couldn’t last, and it didn’t. He stayed for four years and then found his way back to SRI.
DARPA knocked on Cheyer’s door with an offer to head up Tony Tether’s ambitious national CALO effort, which DARPA anticipated would draw on the efforts of AI researchers around the country. Usually DARPA would simultaneously fund many research labs and not integrate the results. The new DARPA program, however, called for SRI to marshal all the research into the development of CALO. Everyone would report to the SRI team and develop a single integrated system. Cheyer helped write the initial DARPA proposal, and when SRI received the award, he became engineering architect for the project. CALO was rooted firmly in the traditional world of first-generation symbolic artificial intelligence—planning and reasoning and ontologies—but there was also a new focus on what has been described as “learning in the wild.”
CALO had the trappings of a small Manhattan Project. Over four hundred people were involved at the peak, and the project would generate more than six hundred research papers. DARPA spent almost a quarter billion dollars on the effort, making it one of the most expensive artificial intelligence projects in history. Researchers on the CALO project tried to build a software assistant that would possess humanlike adaptability, learn from the person it worked with, and change its behavior accordingly.
When CALO passed its annual system tests, DARPA was enthusiastic. Tether awarded the project an excellence prize, and some of the technology made the transition into navy projects. But Adam Cheyer, as engineering architect, had experienced more than his share of frustrations. John McCarthy had famously asserted that building a “thinking machine” would require “1.8 Einsteins and one-tenth the resources of the Manhattan Project.” To put his estimate in perspective, since the Manhattan Project would cost more than $25 billion in current dollars, McCarthy’s estimate would mean that CALO was funded with less than one-tenth of what would be needed to build a thinking machine.
For Cheyer, however, the principal obstacle in designing CALO was not lack of funding. Rather it was that DARPA tried to micromanage his progress. Often unable to pursue its own agenda, the rest of the management team would shunt aside Cheyer’s ideas. He had a difficult time shepherding the huge number of teams, each of which had its own priorities and received only a small amount of funding from the CALO project. Cheyer’s entreaties to work together on a common project that integrated a huge swath of ideas into a new “cognitive” architecture largely fell on deaf ears. The teams listened politely because they were interested in the next round of money, and they would deliver software, but they all wanted to pursue their own projects. In the end there was no way that a large and bureaucratic program could have a direct impact in the real world.
To cope with his frustrations he laid out a series of side projects to work on in 2007. They ranged from efforts to commercialize the CALO technology to the creation, with several friends, of an activists’ social network called change.org. It would be a remarkably productive year for Cheyer. With a graduate student, Didier Guzzoni, he used CALO technologies to build a new software development system that eventually became the foundation for Siri. He also put together a small development team that started commercializing various other components of Siri for applications like smartphone calendars and online news reading. He also quietly helped to cofound Genetic Finance, a stealth machine-learning company that built a cluster of more than one million computers to solve financial problems such as predicting the stock market.
In the midst of all of this, Cheyer approached SRI management to ask for some IR & D funding and told them, “I want a little side project where I’m going to build my own CALO the way it should be done.” He wanted to build a single integrated system, not a patchwork quilt from dozens of different organizations. SRI agreed, and he named his project “Active Ontologies.” He ran it quietly alongside the much larger operation.
The project gained more traction when a key group of SRI technical leaders met for a daylong retreat in Half Moon Bay, a beach town close to the Menlo Park laboratory. There had been growing pressure to commercialize SRI research from the CEO, Curt Carlson, and CALO was an obvious candidate. The retreat was crucial for hashing out answers to basic questions about the goals for the software, like: What should a personal assistant “feel” like? Should they use an avatar design? Avatars had always been a controversial aspect of the design of virtual assistants. Apple’s Knowledge Navigator video had envisioned a prim young male with a bow tie who looked a bit like Steve Jobs. The CALO project, on the other hand, did not have an avatar. The developers went back and forth on whether the system should be a chatbot, the kind of personal companion that researchers had explored for decades in programs like Eliza that engaged human users in keyboard “conversations.” In the end, they came to a compromise. They decided that nobody was going to sit and chat with a virtual robot all day. Instead, they were going to design a system for people who needed help managing their busy day-to-day lives.
The group came up with the notion of “delight nuggets.” Because they were trying to create a humanlike persona, they decided to sprinkle endearing phrases into the software. For example, if a user asked the system about the forecast for the day, the system would answer—and if the forecast indicated that it would rain, the system would add: “And don’t forget your umbrella!” The developers wanted to give the user what he or she wanted and to make the design goal about helping them manage their lives—and then to surprise them, just a little bit. Including these phrases added a touch of humanity to the interaction, even though systems did not yet feature speech synthesis and speech recognition.
The 2007 meeting served as a launchpad. SRI’s commercialization board gave the team the approval to begin looking for outside money in August. The name Siri ended up working on a wonderful range of levels. Not only did it mean “secret” in Swahili, but Cheyer had once worked on a project called Iris, which was Siri spelled backward. And of course, everyone liked that the name was also a riff on SRI.
In 1987 Apple’s chief executive John Sculley gave a keynote address at Educom, the national educational technology conference. He showed a promotional video that a small Apple team had produced to illustrate the idea of something he described as the Knowledge Navigator. At the time, the video, which caught the public’s eye (went “viral” in today’s parlance), seemed impossibly far out. The Knowledge Navigator was a tour de force that pointed the way to a computing world beyond the desktop computer of the mid-1980s. Knowledge Navigator ultimately spawned a seemingly endless stream of high-tech Silicon Valley “vision statements,” including one from Microsoft in 1991 presented by Bill Gates called “Information at Your Fingertips.” Yet at that time, the Knowledge Navigator was early to offer a compelling vision for a future beyond desktop personal computing. The video centered on a conversation between an absentminded professor and a perky, bow-tied on-screen avatar as a guide for both the professor’s research and his day-to-day affairs. It sketched a future in which computer interaction was no longer based on a keyboard and mouse. Instead, the Knowledge Navigator envisioned a natural conversation with an intelligent machine that both recognized and synthesized human speech.
Brought to Apple as chief executive during the personal computing boom, Sculley started his tenure in 1983 with a well-chronicled romance with Apple’s cofounder Steve Jobs. Later, when the company’s growth stalled in the face of competition from IBM and others, Sculley fought Jobs for control of the company, and won.
However, in 1986, Jobs launched a new computer company, NeXT. Jobs wanted to make beautiful workstations for college students and faculty researchers. That placed pressure on Sculley to demonstrate that Apple could still innovate without its original visionary. Sculley turned to Alan Kay, who had left Xerox PARC first to create Atari Labs and then came to Apple, for guidance on the future of the computer market. Kay’s conversations with Apple’s chief executive were summarized in a final chapter in Sculley’s autobiographical Odyssey. Kay’s idea centered on “a wonderful fantasy machine called the Knowledge Navigator,”4 which wove together a number of his original Dynabook ideas with concepts that would ultimately take shape in the form of the World Wide Web.
Alan Kay would later say that John Sculley had asked him to come up with a “modern Dynabook,” which he found humorous, since at the time his original Dynabook still didn’t exist. He said that in response to Sculley’s request, he had pulled together a variety of ideas from his original Dynabook research and the artificial intelligence community, as well as from MIT Media Laboratory director Nicholas Negroponte, an advocate of speech interfaces.5 Negroponte had created the Architecture Machine Group at MIT in 1967, in part inspired by the ideas of Ivan Sutherland, whose “Sketchpad” Ph.D. thesis was a seminal work in both computer graphics and interface design.
Historians have underestimated Negroponte’s influence on Apple and the computer industry as a whole. Although Negroponte’s “Architecture Machine” idea never gained popular traction, it did have a very specific impact on Bill Atkinson, one of the principal designers of Apple’s Lisa and Macintosh computers. Many of the ideas for Lisa and Macintosh were generated from Negroponte’s early efforts to envision what the field of architecture would be like with the aid of computers. Negroponte’s group created something called “DataLand,” a prototype of a visual data management system. In many ways, DataLand was a much broader exploration of how human computer users might interact with information more fluidly. It was certainly broader in scope than the projects at PARC, which focused more narrowly on a creating a virtual desktop. Indeed, Negroponte’s goal was expansive. DataLand allowed users to view, in a special room, an immersive information environment back-projected on a giant piece of polished glass as a series of thumbnails representing everything from documents to maps. It was like using a Macintosh or a Windows computer, but having the control screen surround you rather than appear on a small display. It was possible to zoom in on and “fly” through the virtual environment by using a joystick, and when you got close to objects like files, they would talk to you (e.g., “This is Nicholas’s Calendar”) in a soothing voice. Atkinson visited Negroponte’s lab and thought this kind of interface could solve Apple’s document filing problem. He wanted to organize documents spatially and place them in proximity to other related documents. Although it was a fascinating concept, it proved unwieldy in practice, and the group returned to something closer to the PARC desktop ideas.
Kay “channeled” ideas that he had gathered in his discussions with Negroponte, passing them on both to Sculley and to the group that created the Knowledge Navigator video. Kay credited Negroponte with playing what he called the “Wayne Gretzky Game”—skating to where the puck was going, rather than where it was. Kay had eagerly read Gordon Moore’s early Electronics article, which was bold enough to sketch the progress of computing power ten years into the future—1975.6 He drew the line out to 1995 and beyond. This future-oriented approach meant that he could assume that 3-D graphics would be commercially available within just several decades.
Negroponte represents the “missing link” between Norbert Wiener’s early insights into computing and its consequences, the early world of artificial intelligence, and the explosive rise of the personal computer industry during the 1980s. In the late sixties, Negroponte was teaching a course on computer-aided design for architects at MIT. He was not a great fan of lecturing and so had perfected a Tom Sawyer approach to his course—he brought in many guest lecturers. He attracted a dazzling and diverse array of talent. Isaac Asimov, for example, was living in Cambridge at the time and came to Negroponte’s class to speak each year, as did Gordon Pask, a British cyberneticist who was traveling widely in U.S. computer research circles in the 1960s and 1970s. If Kay was influenced by Negroponte, he in turn would point to the influence and inspiration of Gordon Pask. At the beginning of the interactive computing era Pask had a broad but generally unchronicled influence on computer and cognitive science research in the United States. Ted Nelson met him in the hallways of the University of Illinois Chicago Circle campus and fell under his spell as well. He described Pask affectionately in his Computer Lib manifesto as the “maddest of mad scientists.”
In 1968, Negroponte, like many in the computing world, was deeply influenced by Ivan Sutherland’s 1963 Ph.D. project, Sketchpad, a graphical and interactive computing tool that pioneered human-computer interaction design. Following in Sutherland’s footsteps, Negroponte began work on an “Architecture Machine” that was intended to help human architects build systems that exceeded their individual intellectual grasp. His first effort to build the machine was a software program called URBAN5. The year after he created it, he took a video of his early Architecture Machine project to the London art exhibition known as Cybernetic Serendipity, which was held at the Institute of Contemporary Arts. The exhibition had assembled a wild variety of mechanical and computerized art exhibits, including large mobiles created by Gordon Pask, designed with interactive parts to permit viewers to enter into a “conversation” with his installations.
The two met at the exhibition and became good friends. Pask would come to visit the Architecture Machine Group three or four times a year for a week at a time and always stayed as a guest at Marvin Minsky’s home. He was a striking character who dressed the part of an Edwardian dandy, complete with a cape, and who occasionally lapsed into double-talk and wordplay. He was squarely in the Norbert Wiener cybernetics tradition, which had taken hold with more force in Europe than in the United States. Pask was also subtly but significantly at odds with the prevailing artificial intelligence world. If AI was about making smart machines that mimicked human capabilities, cybernetics was focused instead on the idea of creating systems to achieve goals.7 Gordon Pask’s insight into the nature of intelligence, which he situated not in the individual but in a conversation between people, strongly influenced Negroponte. Indeed, it was Pask who laid the groundwork for viewing human-machine interactions as conversations that would be later demonstrated by Knowledge Navigator and still later realized in Siri as a conversation: he “conceived human-machine interaction as a form of conversation, a dynamical process, in which the participants learn about each other.”8
Negroponte was early to grasp Pask’s ideas about computer interaction. Later in the 1970s, Pask’s ideas also influenced Negroponte’s thinking and design at the MIT Media Laboratory as he broadened the original mission of the Architecture Machine Group. Ideas from the Media Lab filtered into Apple’s strategic direction because Kay was close to Negroponte and was spending time teaching there. Few noticed it at the time, but Apple’s release of Siri as a critical addition to the iPhone 4S in October of 2011 fell within two weeks of Kay’s predicted release date for the Knowledge Navigator. The idea traced a direct path from Pask to Negroponte to Kay to the Siri team. A parallel thread ran through Gruber’s original work on computing tools for the disabled, to his work on Intraspect, and to the new project at SRI. In the space of just a generation, a wave of computer-mediated communication technology had inaugurated a new way of facilitating collaboration between humans and machines. Gruber recognized that humans had evolved from using tribal communication to written language, and then quickly to using the telephone and computer communications.
Computing had become a prosthesis, not in a bad sense, but rather as a way to augment human capabilities as first foreseen by Vannevar Bush, Licklider, and Engelbart. Intraspect and Hypermail had been efforts to build a cognitive prosthesis for work that needed to go beyond the size of a small tribe. The nature of collaboration was changing overnight. People could have conversations when they weren’t in the same room, or even in the same time zone. Simple online email lists like www-talk were being used to develop new Web standards. A permanent archive made it possible for new participants to quickly get up to speed on various issues by reading a record of past conversation.
The idea of the archive would become the guiding principle in the development of Siri. The SRI engineers developed an external memory that provided notes, reminders, schedules, and information, all in the form of a human conversation. The Siri designers adapted the work done on CALO and polished it. They wanted a computer that would take over the task of secretary. They wanted it to be possible to say, “Remind me to call Alan at three thirty or on my drive home.”
Just before Cheyer’s project was renamed Siri, Gruber would arrive to work with the tiny team at SRI that included Cheyer and Dag Kittlaus. Kittlaus had been managing mobile communications projects at Motorola before coming to SRI. They code named the project HAL, with only a hint of irony.
Cheyer was charming, but he was also fundamentally a highly technical engineer, and for that reason could never be the head of a company. Kittlaus was the opposite. A good-looking, tanned Norwegian executive who straddled the line between technology development and business, he was a quintessentially polished business development operator. He had done early work on the mobile Internet in Europe. Kittlaus arrived with a broad charter, having been asked by the lab’s managers to come in as an “entrepreneur-in-residence.” There wasn’t any particular assignment; he was just supposed to look around and find something promising. It was Kittlaus who found Cheyer. He immediately realized that Cheyer was a hidden gem.
They had first met briefly when Cheyer had been demonstrating prototypes for the wireless industry based on his OAA work in the 1990s. There had been some interest from the telecommunication industry, but Cheyer had realized that there was no way that his toy demos, written in the Prolog artificial intelligence language, would be something that could be used by millions of mobile phone users.
Although SRI later took pains to draw the links between CALO and Siri in order to garner a share of the credit, it was Cheyer who had dedicated his entire career to pursuing the development of a virtual assistant and natural language understanding. When Kittlaus first saw Cheyer’s work on Siri in 2007, he told him, “I can make a company out of this!” Cheyer, however, wasn’t immediately convinced. He didn’t see how Kittlaus could commercialize Siri, but he agreed to help him with the demos. Kittlaus won him over after buying him an iPhone, which had just been released. Cheyer had a very old Nokia and no interest in the new smartphone gadgets. “Play with this!” Kittlaus told him. “This thing is a game changer. Two years from now there will be a competitive response and every handset manufacturer and telco will be desperate to compete with Apple.” Since bandwidth would still be slow and screens would still be small, the companies that tried to compete with Apple would have to look for any competitive advantage they could find.
They were planning a start-up and so they began looking for a technical cofounder, but they also needed an outsider to assess the technology. That search led them to Tom Gruber. Cheyer and Kittlaus prepared a simple demo that appeared in Mosaic, the first Web browser, for Gruber. Users could type a question into a search box and it would respond. At the outset he was skeptical.
“I’ve seen this before, you guys are trying to boil the ocean,” he told Cheyer.
The program seemed like a search engine, but then Cheyer began to reveal all the AI components they had integrated into the machine.
Gruber paused. “Wait a moment,” he said. “This isn’t going to be just a search engine, is it?”
“Oh no,” Cheyer responded. “It’s an assistant.”
“But all you’re showing me is a search engine. I haven’t seen anything about an assistant,” Gruber replied. “Just because it talks to me doesn’t mean anything.”
He kept asking questions and Cheyer kept showing him hidden features in the system. As he continued the demonstration, Gruber started to run out of steam and fell silent. Kittlaus chimed in: “We’re going to put it on phones.”
That took Gruber by surprise. At that point, the iPhone had not yet become a huge commercial success.
“This phone is going to be everywhere,” Kittlaus said. “This is going to completely change the world. They are going to leave the BlackBerry behind and we want to be on this phone.” Gruber had spent his time designing for personal computers and the World Wide Web, not mobile phones, so hearing Kittlaus describe the future of computing was a revelation.
In the mid-2000s, keyboards on mobile phones were a limiting factor and so it made more sense to include speech recognition. SRI had been at the forefront of speech recognition research for decades. Nuance, the largest independent speech recognition firm, got its start as an SRI spin-off, so Cheyer understood the capabilities of speech recognition well.
“It’s not quite ready yet,” he said. “But it will be.”
Gruber was thrilled. Cheyer had been the chief architect of the CALO project at SRI, and Kittlaus had deep knowledge of the mobile phone industry. Moreover, Cheyer had access to a team of great programmers who were equipped with the necessary skills to build an assistant. Gruber realized immediately that this project would reach an audience far larger than anything he had worked on before. In order to succeed, though, the team needed to figure out how to design the service to interact well with humans. From his time at Intraspect and Real Travel, Gruber understood how to build computing systems for use by nontechnical consumers. “You need a VP of design,” he told them. It was clear to Gruber that he had the opportunity to work with two of the world’s leading experts in their fields, but he had just left an unsuccessful start-up himself. Did he want to sign up again for the crazy world of a start-up so soon?
“Do you need a cofounder?” Gruber asked the two men at the end of the meeting.
The core of the team that would build Siri was now in place.
Independently, the three Siri founders had already spent a lot of time pitching investors in the area for funding for earlier projects. In the past, this had been an onerous chore for Gruber, since it required countless visits to venture capitalists who were often uninterested, arrogant, or both. This time their connection to SRI opened the doors to the Valley’s blue-chip venture firms. Dag Kittlaus was a master showman, and on their tour of the venture capital firms on Sand Hill Road, he developed a witty and charming pitch. He would take Cheyer and Gruber in tow to each fund-raising meeting. The men would then be escorted into a conference room and after they introduced themselves, Kittlaus innocently asked the VCs, “Hey, do any of you have one of those newfangled smartphones?” The VCs thrust their hands in their pockets and almost always retrieved Apple’s then-brand-new iPhone.
“Do you have the latest apps downloaded?” Kittlaus asked.
“Do you have Google search?”
Kittlaus then placed a twenty-dollar bill on the table and told the VCs, “If you can answer three questions in five minutes, you can walk away with my money.” He then asked the VCs three questions, the answers to which were difficult to search on Google or other similar apps. The venture capitalists listened to the questions and then either said, “Oh, I don’t have that app,” or made their way through multiple browser pages, following various hyperlinks in an effort to synthesize an answer. Inevitably, the VCs failed to answer even one of the questions in the time allowed, and Kittlaus never lost his money.
It was a clever way for the team to force the potential investors to visualize the need for the missing Siri application. To help them, the team put together fake magazine covers. One of them read: “The End of Search—Enter the Age of the Virtual Personal Assistant.” Another one featured an image of Siri crowding Google off the magazine cover. The Siri team also built slides to explain that the Google search was not the end point in the world of information retrieval.
Ultimately the team would be vindicated. Google was slow to come to a broader, more conversational approach to gathering and communicating information. Eventually, however, the search giant would come around to a similar approach. In May of 2013, Amit Singhal, head of the company’s “Knowledge” group, which includes search technology, kicked off a product introduction by proclaiming “the end of search as we know it.” Four years after Siri had arrived, Google acknowledged that the future of search was conversation. Cheyer’s jaw hit the floor when he heard the presentation. Even Google, a company that was all about data, had moved away from static search and in the direction of assistance.
Until they toured Sand Hill for venture capital, Adam Cheyer had been skeptical that the venture community would buy into their business case. He kept waiting for VCs to toss them out of their meetings, but it never happened. At this point, other companies had released less-impressive voice control systems that had gone bust. General Magic, the once high-flying handheld computing Apple spin-off, for example, had tried its hand as a speech-based personal assistant before going out of business in 2002. Gradually, however, Cheyer realized that if the team could develop a really good technical assistant, the venture capitalists and the money would follow.
The team had started looking for money in late 2007 and they were funded before the end of that year. They had initially visited Gary Morgenthaler, one of Silicon Valley’s elder statesmen and an influential SRI contact, for advice, but Morgenthaler liked the idea so much that he invited them back to pitch. In the end, the team picked Morgenthaler and Menlo Ventures, another well-known venture firm.
Before the dot-com era, companies kept their projects under wraps until they were ready to announce their developments at grand publicity events, but that changed during the Silicon Valley buildup to the bubble in the late 1990s. There was a new spirit of openness among more service-oriented new companies, which shared information freely and raced to be first to market. The Siri developers, however, decided to stay quiet; they even used the domain name stealth-company.com as a placeholder and a tease. They found office space in San Jose, far away from the other software start-ups that frequently settled in San Francisco. Having a base in San Jose also made it easy to find new talent. At the time, technical workers with families were moving to the south end of the Peninsula, and commuting to downtown San Jose was a breeze compared to the trek to Mountain View or Palo Alto.
To build the company culture, Adam Cheyer went out and bought picture frames and handed them out to all of the company’s employees. He asked everyone to choose a hero and then put a framed picture of that person on their desks. Then, he asked them to pick a quote that exemplified why that person was important to them. Cheyer hoped this would serve two purposes: it would be interesting to see who people chose, and it would also reflect something about each employee. Cheyer chose Engelbart and attached an early commitment made by the pioneering SRI researcher: “As much as possible, to boost mankind’s collective capability for coping with complex, urgent problems.” For Cheyer, the quote perfectly expressed the tension between automating and augmenting the human experience. He had always harbored a tiny feeling of guilt about his work as he moved between what he thought of as “people-based” systems and artificial intelligence-based projects. His entire career had vacillated between the two poles. It was 2007, the year that he also helped his friends start the activist site change.org, which fell squarely within the Engelbart tradition, and he believed that Siri was moving along the same path. Gruber had wanted to choose Engelbart as well, but when Cheyer chose him first he fell back on his musical hero, Frank Zappa.
Despite having his project sold to Tymnet in the early 1970s, Doug Engelbart had been brought back into the fold at SRI when Cheyer had arrived, and Cheyer had come to know the aging computer scientist as a father figure and a guiding light. Working on projects that were inspired by Engelbart’s augmentation ideas, he had tried to persuade Engelbart that he was working in his tradition. It had been challenging. By the 1990s, Engelbart, who had mapped it all out beginning in the 1960s, was a forlorn figure who felt the world had ignored him. It didn’t matter to Cheyer. He saw the power of Engelbart’s original vision clearly and he took it with him when he left SRI to build Siri.
In college, Cheyer had begun visualizing goals clearly and then systematically working to achieve them. One day just as they were getting started he wandered into an Apple Store and saw posters with an array of colorfully crafted icons representing the most popular iPhone applications. All of the powerful software companies were there: Google, Pandora, Skype. He focused on the advertising display and said to himself: “Someday Siri is going to have its icon right here on the wall of an Apple Store! I can picture it and I’m going to make this happen.”
They went to work. In Gruber’s view, the team was a perfect mix. Cheyer was a world-class engineer, Kittlaus was a great showman, and Gruber was someone who could build high-technology demos that wowed audiences. They knew how to position their project for investors and consumers alike. They not only anticipated the kinds of questions people would ask during a demo; they also researched ideas and technology that would have the most crowd appeal. Convincing the observer that the future was just around the corner became an art form unique to Silicon Valley. The danger, of course, was being too convincing. Promising too much was a clear recipe for disaster. Other personal assistants projects had failed, and John Sculley had publicized a grand vision for Knowledge Navigator, which he never delivered. As Siri’s developers kicked the project into high gear, Gruber dug out a copy of the Knowledge Navigator video. When Apple had shown it years earlier, it had instigated a heated debate within the user interface design community. Some would argue—and still argue—against the idea of personifying virtual assistants. Critics, such as Ben Shneiderman, insisted that software assistants were both technically and ethically flawed. They argued for keeping human users in direct control rather than handing off decisions to a software valet.
The Siri team did not shy away from the controversy, and it wasn’t long before they pulled back the curtain on their project, just a bit. By late spring 2009, Gruber was speaking obliquely about the new technology. During the summer of that year he appeared at a Semantic Web conference and described, point by point, how the futuristic technologies in the Knowledge Navigator were becoming a reality: there were now touch screens that enabled so-called gestural interfaces, there was a global network for information sharing and collaboration, developers were coding programs that interacted with humans, and engineers had started to finesse natural and continuous speech recognition. “This is a big problem that has been worked on for a long time, and we’re beginning to see some progress,” he told the audience. Gruber also pointed to developments that were on the horizon, like conversational speech between a computer agent and a human and the delegation of tasks to computers—like telling a computer: “Go ahead, make that appointment.” Finally, he noted, there was the issue of trust. In the Knowledge Navigator video, the professor had let the computer agent handle calls from his mother. If that wasn’t a sign of trust, what was? Gruber hoped his technology would inspire that same level of commitment.
After discussing the technologies forecasted in the Knowledge Navigator video, Gruber teased the audience. “Do we think that this Knowledge Navigator vision is possible today?” he asked. “I’m here to announce”—he paused slightly for effect—“that the answer is still no.” The audience howled with laughter and broke into applause. He added, “But we’re getting there.”
The Siri designers discovered early on that they could quickly improve cloud-based speech recognition. At that point, they weren’t using the SRI-inspired Nuance technology, but instead a rival system called Vlingo. Cheyer noticed that when speech recognition systems were placed on the Web, they were exposed to a torrent of data in the form of millions of user queries and corrections. This data set up a powerful feedback loop to train and improve Siri.
The developers continued to believe that their competitive advantage would be that the Siri service represented a fundamental break with the dominant paradigm for finding information on the Web—the information search—exemplified by Google’s dramatically successful search engine. Siri was not a search engine. It was an intelligent agent in the form of a virtual assistant that was capable of social interaction with humans. Gruber, who was also chief technology officer at Siri, laid out the concepts underlying the service in a series of white papers in the form of technical presentations. Finding information should be a conversation, not a search, he argued. The program should be capable of disambiguating questions to refine the answers to human questions. Siri would provide services—like finding movies and restaurants—not content. It would act as a highly personalized broker for the human user. In early 2010 the Siri team put together an iPhone demonstration for their board of directors. Siri couldn’t speak yet, but the program could interpret spoken queries and converse by responding to human queries in natural language sentences that were displayed in cartoonlike bubbles on screen. The board was enthusiastic and gave the developers more time to tune and polish the program.
In February of 2010, the tiny start-up released the program on the iPhone App Store. They received early positive reviews from the Silicon Valley digerati. Robert Scoble, one of the Valley’s high-profile technology bloggers, referred to it as “the most useful thing that I’ve seen so far this year.” Faint praise perhaps—it was still very early in the year.
Gruber was away at a technology retreat during the release and had almost no access to the Web when the product was first available. He had to rely on secondhand reports—“Dude, have you seen what’s happening to your app?!”—to keep up.
It got better. Thanks to a clever decision to place the application in a less obvious category on the App Store—Lifestyle—the Siri Assistant immediately shot right to the top of the category. It was one of the tricks Gruber had learned during his time at Real Travel—the art of search engine optimization. Although they had introduced Siri on the iPhone, Kittlaus had negotiated a spectacular agreement with Verizon, which did not yet carry the iPhone. He described it as “the greatest mobile deal in history.” The deal guaranteed that Siri would be on every new Verizon phone, which meant that the software would become the poster child for the Android smartphone. The deal was almost set in stone when Kittlaus received a call on his cell phone.
“Hi, Dag,” the caller said. “This is Steve Jobs.”
Kittlaus was momentarily stunned. “How did you get this phone number?” he asked.
“It’s a funny story,” Jobs replied. He hadn’t had any idea how to find the small development team, but he had hunted around. Because every iPhone developer had to supply a phone number to the App Store, Apple’s CEO found Kittlaus’s number in his developer database.
The team’s first foray into the legendary “reality distortion field”—Jobs’s personal brand of hypnotic charisma—wasn’t promising. Jobs invited the trio of Siri developers to his house in the heart of old Palo Alto. Jobs’s home was a relatively low-key 1930s Tudor-style set next to an empty lot that he had converted into a small grove of fruit trees and a garden. They met in the living room, which was sparsely furnished, like much of Jobs’s home, and featured an imposing Ansel Adams original.
Jobs presented the trio with a dilemma. They had all been successful in Silicon Valley, but none of them had yet achieved the career-defining IPO. The Siri team—and certainly their board members—thought it was very possible that they would receive a huge public stock offering for Siri. Jobs made it clear that he wanted to acquire Siri, but at that juncture the team wasn’t planning to sell. “Thank you very much,” they told him, and then left.
Several weeks later Apple was back. They were once again invited to Jobs’s home, where Jobs, then clearly sick despite continuing to publicly deny it, turned on the charm. He promised them an overnight market of one hundred million users—with no marketing and no business model. Or, Jobs said, they could roll the dice, try to be the next Google, and slog it out. The Siri team also understood that if they went with Verizon, they would run the risk of being shut out of the iTunes Store. Steve didn’t have to say it, but it was clear that they had to choose which half of the market they wanted.
Jobs’s offer sold them, but it didn’t immediately sell the board, which was by now eager for an IPO exit. The three founders had to reverse ground and persuade their board members. Ultimately the investors were convinced; Jobs’s offer was lucrative enough and offered much lower risk.
Soon after Apple acquired Siri in April of 2010, the Siri team moved into the very heart of the office space for Apple’s design group, on half of the top floor of Infinite Loop 2. Although Apple could have licensed Nuance to convert speech directly to text—which Google later did—Jobs decided that Apple would undertake the more ambitious task of placing an intelligent assistant software avatar on the iPhone. Siri helped solve another major problem that Apple had with its new iPhone and iPad. Glass screens and multitouch control could replace a keyboard and mouse for navigation through screens, but they did not work well for data entry. This was a weak point, despite Jobs’s magnificent demonstration of text entry and auto-correction during the first product introduction. Speech entry of single words or entire sentences is many times more rapid than painstakingly entering individual words by poking at the screen with a finger.
In the beginning, however, the project was met with resistance within the company. Apple employees would refer to the technology as “voice control,” and the Siri team had to patiently explain that their project had a different focus. The Siri project didn’t feed into the “eye candy” focus at Apple—the detailed attention of software and hardware design that literally defined Apple as a company—but was instead about providing customers with reliable and invisible software that worked well. But many engineers in the software development organization at Apple thought that if Steve—and later on one of his top lieutenants, Scott Forstall—didn’t say “make it happen,” they didn’t need to work on that project. After all, Apple was not recognized as a company that developed cloud-computing services. Why reinvent the wheel? An assistant or simply voice control? After all, how much difference would it really make? In fact, people were dying while reading email and “driving while intexticated,” so presenting drivers with the ability to use their phones safely while driving made a tremendous difference.
When Apple’s project management bureaucracy balked at the idea of including the ability to send a hands-free text message in the first version of the software, Gruber, who had taken the role of a free-floating technical contributor after the acquisition, said he would take personal responsibility for completing the project in time for the initial Apple Siri launch. He decided it was a “put your badge on the table” issue. With just a summer intern in tow, he worked on all of the design and prototyping for the text messaging feature. He begged and borrowed software engineers’ time to help build the system. In the end, it was accepted. At the time of Siri’s launch, it was possible to send and receive texts without touching the iPhone screen.
Not everything went as smoothly, however. The Siri team also wanted to focus on what he called “attention management.” The virtual personal assistant should also help people remember their “to-do list” in an “external memory” so they wouldn’t have to. The original Siri application had an elaborate design for what the team described as “personal memory”: it wove an entire set of tasks together in the right order, prodding the user at each step like a good secretary. In the race to bring Siri to the iPhone, however, much of the deeper power of the service was shelved, at least temporarily. The first iteration of Siri only included a small subset of what the team had originally created.
In his final act in the computing world, Steve Jobs had come down emphatically on the side of the forces of augmentation and partnership. Siri was intended to be a graceful, understated model for the future collaboration between humans and machines, and it marked the beginning of a sea change at Apple that would take years to play out. The project also came together in a furious rush, and sadly Jobs died the day after Siri’s debut. The product launch event in October 2011 thus had to acknowledge a muted counterpoint in what was otherwise a glorious crowning moment to their rocket-fast three-year crusade. Naturally, there was a shared feeling of triumph. On the morning of Siri’s unveiling, Cheyer found himself back in an Apple Store. He walked up to the store and next to the front door was a giant plasma display that read: “Introducing Siri!”