Speaker Deck

Age of Centaurs - Chatbots and the Rise of Hybrid AI

by Manuel Ebert

Published May 22, 2016 in Technology

You are probably wondering why I would want to talk about mythical creatures at a data science conference and what they have to do with artificial intelligence after all.

So, today I’m going to talk about three things. First, I’ll give you a little background on “conversational” AI, and how we got to where we are now. Then I'll walk you through the current landscape of chatbots and artificial assistants like Siri or the one Google just released. I'll explain how they work on a technical level, but I know it's one of the last talks of a super intense conference, so I'll keep it light and if you have more detailed question please feel free to ask later. And then, finally, back to the centaurs, and a breakthrough in AI that hasn't quite received the attention it deserves yet.

But before I take you on a trip through the wardrobe, let’s look at where it all started.

# 1. Artificial Intelligence - Where are we at?

Ok, today everybody is all crazed about AI and deep learning and chatbots. That wasn’t always the case. When I started college, AI was thought to be more or less dead. I remember when I applied for a scholarship and was interviews by a computer science professor who asked me what I was particularly interested in. I said AI, and she just looked at me like a mother telling her teenage son that Santa is dead. She just moved a little closer and said, quietly: “You know AI has failed, don’t you?”

15 years ago, Artificial Intelligence was the Cold Fusion of computer science. Toxic. In academia, you couldn’t get any grants, only a handful of startups were doing AI.

How did this happen? Well, to start with, expectations for AI were rather lofty. When Minksy and McCarty coined the term “Artificial Intelligence” in 1956, they thought that most problems of AI could be quickly solved during a summer camp. This was their proposal:

> The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

We all know engineers are prone to underestimating how long projects will take and missing their deadlines, but those guys were off by 60 years and counting! Remember Rosie from the Jetsons? It’s been 60 years, and all we got was a lousy Roomba! To be fair though, the Jetsons also promised us flying cars. And the Roomba does indeed clean the floor, but it’s a very poor conversation partner. So let’s focus on one particular aspect of AI, language.

Maybe you’ve heard of ELIZA, created by Joseph Weizenbaum in 1964. She was an early chat bot resembling a Rogerian psychotherapist. So, naturally, I consulted her when I suffered from anxiety over writing this talk a week ago:

ELIZA: Please tell me what's been bothering you.
YOU: I have to prepare a talk for a conference, and suffer from writer's block.
ELIZA: Can you elaborate on that?
YOU: I should be writing, but instead I succumb to Netflix and Guilt.
ELIZA: That's quite interesting.
YOU: What should I do?
ELIZA: You say what should you do?
YOU: Yes, I said what should I do?
ELIZA: I understand.
YOU: Do you really?
ELIZA: We were discussing you -- not me!

Remember: this program was written in the early sixties. And you can probably already guess how it worked: it takes what I say, and does some symbolic transformations one the input to formulate her response. “What should I do?” — “You say what should you do?”. And that’s where we got stuck for almost fifty years.

Most of the world didn’t really hear anything new from these “conversational AIs” until Apple introduced the Speech Interpretation and Recognition Interface, or in short, Siri in 2011. What happened in those fifty years? One of the problem was Minsky’s and McCarthy’s early assumption that human cognition was nothing but some advanced symbol manipulation, a problem solving skill. We lost decades of AI research because the intellectual elite thought that the pinnacle of human cognition is playing chess, not peekaboo.

# 1.1 How do conversational AIs work?

Okay, so much for the history lesson, back to the future. I just mentioned Siri as one of the few examples of conversational AIs that have found wide spread adoption. Let’s take a brief look at how Siri works — and why.

The people behind Siri made one important twist to Natural Language Processing. Instead of starting out with understanding what a sentence means, they start with what a user would want to do. Getting weather information, scheduling a meeting, or calling someone are all such intents, and Siri has literally hundreds of them. All of these intents were written and designed by humans. That means, that we limit the number of possible utterance we have to understand to a restricted domain.

Alright, now we have this large set of intents, of possible things the user can ask for. But there are still so many ways to, ask for weather information. What's the weather outside? Should I wear a scarf? Is it going to rain later? Is it sunny in Philadelphia? It’s just not scalable to hard-code every possible way people ask for the weather.

So when I ask Siri what the weather is like, here is what actually happens. First, the wave form is compressed and sent to a server, where speech recognition algorithm deciphers what I said and returns it as plain text. Now that we have a text representation of my request, we turn it into a numerical representation. That step is called vectorization, and there are different ways of doing it. Siri mostly uses a fancy way of counting how often each word in the entire dictionary occurs in the sentence. Google has a different approach using a neural network for this task in a process called Word2Vec. They also pepper in a lot of syntactic information, so they use syntactic analysis to determine what the main verb is, what's the object and the subject and so on. They actually just two weeks ago released their syntactic parser with the catchy name Parsey McParseface. Either way, at the end of this process my request is simply a multi-dimensional vector, and we can compare it to our training set using a classification algorithm.

There are many of these around, and I don't know which ones Siri is using in particular, but the important part here is that we map an utterance to an intent *without* having to understand every single word in the sentence, without having to know what a scarf or how to construct a subjunctive clause.

But, you know, if you want to know what the weather is like outside you should probably just get your butt off your seat and look outside the window. And Siri will happily tell you to do that, too, by the way. Most users want to know what the weather will be like later, or at aunt Mary's place in New Jersey. This is why intents come with parameters. For the weather intent, this would be time and location. Only *after* Siri classifies a user's utterance as a specific intent, it goes back to the sentence and looks for these parameters in a process called Named Entity Recognition.

Every intent also has an action. In the case of asking for weather information, Siri will call The Weather Channel’s API, get the information and display it in a pretty way. Other intents have some cheeky built-in responses.

You might have noticed that Siri will try to disambiguate your parameters, too. That’s generally clever, but can lead to, uh, awkward situations.

By the way, before Apple acquired Siri and baked it into their system, Siri was actually able to do a lot more things, like make a restaurant reservation and buy movie tickets. As the original developers described it, Siri was always meant to be a “do-engine”, not a “search-engine”. I guess the main reason Apple decided to dumb it down is that they want to deliver a consistent user experience for all customers, and while it’s easy enough to programmatically order movie tickets in the US, other markets are more fragmented, so it’s hard to keep that up for all 300 other countries. Amazon has a very different approach there. Let’s build it for San Francisco first, fuck the rest of the world.

There are several other clever pieces of technology that give Siri an edge, of course. But none of these details matter much to understand how current assistant AIs work. Turn the text into numbers. Classify these numbers into an intent. Go back to the text and find parameters for the intent, or ask for clarification. Execute the action. Turn the result back into a sentence and send a response.

And that’s not just how Siri works, but basically all modern chat-like interfaces to AI. The problem is, it’s very easy to tell that Siri is not a human. For example, she seems to suffer from a bad case of anterograde amnesia. When I tell her that octopuses are my favorite animal, she won’t remember it for a single moment. By the way, octopuses are really cool.

Well, that seems like an easy thing to fix, right? But what about all the other times Siri just has no idea what you want from her? When you get a frustrating defeatist response to a seemingly mundane request? When you phrase something just slightly different and suddenly Siri has no idea what you’re talking about?

Here’s the problem: as long as there’s an odd chance that Siri just won’t be able to help you with your request, her usage will be very, very limited to a few tasks that you know she can handle. What does it take to create an actually useful artificial assistant? Do we just have to create more and more intents? Gather more and more data? Or do we need a different approach altogether?

I’ve been talking an awful lot about Siri so far, and that’s of course because everybody knows Siri. But hey, the other big players have their own artificial assistants. Microsoft has Cortana, the little voice in Amazon’s Echo devices is called Alexa, and Google just announced their Echo competitor Google Assistant this week. By the way, can we just give Google some credit for NOT sticking to the industry trend of giving all artificial assistants female names? Why do even our robots have to follow sexist stereotypes?

Now, when looking at technologies or start-ups with investor eyes, I usually put them on a scale somewhere between pure entertainment and bleak utility. And of course we can do this with Siri and other chat-like interfaces too, be it voice or chatbots. On the very left, we’ve got bots like Mitsuku, which is the most popular bot on messaging platform Kik. If you don’t know what Kik is it’s probably because you’re over 20. Branded bots are also somewhat popular in terms of number of user interactions, at least among teenagers. On the right, you have completely utilitarian bots like 1-800 flowers or ASSIST, that let you order flowers or Burritos. Here’s a little secret: bots on the far right of the spectrum are generally not doing so well. They’re not retaining their users, and the perceived value is very low.

There is however a number of bots, or more generally chat-like services that are completely utilitarian, but still have a growing, happy user base and even turn a profit!

# Hybrid AI

For example, Clara of ClaraLabs and x.ai’s Amy both schedule meetings. It’s simple, really — if I want to meet you for lunch next week I’ll just send you an email and put Clara or Amy on CC, and they will automatically reach out to you and suggest a few times when I’m free based on my Google Calendar. You can just reply back and she’ll put an event on my calendar, or you can propose a different date, Clara and Amy will handle that effortlessly, and I won’t even see your conversation between them. I can even ask Clara to reschedule all of my meetings for that morning if I wake up feeling sick or, uh, hungover, and she’ll automatically reach out to everybody, always kind and polite. I seriously don’t know how I could live without them.

And then there’s Pana. Pana does all my travel bookings. I text Pana where I want to go and when, and within seconds it gives me three or four reasonable choices for flights. I tell Pana which one I like most, and wait for my boarding pass in my inbox. It also books hotels, rental cars, changes or cancels my flights and does a lot of other things around travel. Penny is your personal finance assistant and tells you weather or not you can afford this new TV set.

It seems like the AI revolution was finally there. But then there was an article that claimed that Clara was made of people! Scandalous! Comparisons were drawn to the 17th century “Mechanical Turk”, a seemingly robotic automaton that could play chess so well it even beat humans. But in reality there was a little chess prodigy inside the automaton controlling its arms. Not coincidentally, Amazon now runs a service called Mechanical Turk with which you can outsource small tasks to human workers via an API. And people accused Clara of doing exactly that: not being a real AI at all, but using an army of human workers to come up with responses to the user. Are you guys familiar with Seattle’s Chinese Room through experiment? Well, it looks like we’re just dealing with Mechanical Turks in a Chinese Room!

And Clara was not the only one. In fact, all the companies and products I mentioned have that in common: they’re not pure AI, but somewhere have humans in the loop. So what, is it all a big hoax?

I think not. On the contrary, something incredibly clever is happening here, and it’s called Hybrid AI. A combination of human and artificial intelligence. That works in two ways: let’s take Clara and Amy as an example again. When I ask them to schedule a meeting, their built-in AI will immediately generate a response. If the system is very confident about its assessment of the situation and the response it generated, it might automatically send this out to the user. But otherwise, the response is presented to human workers for approval, often with one or two alternative choices. That means that there’s always a human in the loop, and the output of the system will always have a human-level quality. But the same human worker can now handle ten or twenty times as many clients as they could I they had to write all of the emails themselves!

And this is where our mythical beast come in, the centaurs, half man, half horse. Remember when IBM’s Deep Blue beat chess grandmaster Garry Kasparov for the first time, in 1997? Boy was I excited. Nowadays of course every smartphone can beat a chess world champion. But Kasparov was, as always, ahead of his time, and suggested a new style of chess: Centaurs, human chess players and computers working together as a team. And it turns out that fusing human intuition with the ability to evaluate millions of moves almost instantly is a winning combination. Today, centaur teams of even average human players with average chess programs beat both human chess masters and the most advanced chess AIs.

There’s of course another clever thing about having humans in the loop: every time the humans have to course-correct, they train the AI a little bit more. Plus, you can figure out what your users will really ask for without having to confine yourself to a niche first. Facebook M, which is based on the technology of a company called wit.ai, uses that approach. And that’s very important, because right now it’s still technically impossible to build an all-purpose domain independent chatbot.

So, in a sense, Hybrid AI is Humans pretending to be bots that pretend to be humans. The irony does not escape me. But that’s not the only application of hybrid AI.

One of the hot topics on medicine right now is pattern recognition for medical imaging. You know, there’s this stereotype that technology and robots will replace mostly blue-collar workers. But one group of people currently fearing for their jobs are radiologists. 13 years of higher education, earning upwards of 300k a year. Most of their job is staring at CAT scans, MRIs, PET scans, and looking for tumors and cancers and whatever else is going to eat you. Turns out, computers are really good at that too. But you don’t really want to leave it to an algorithm to put you on chemotherapy, and that’s why modern radiology is mostly an hybrid AI task. Computers pick out regions of interest in images, and humans validate and approve them. Radiologists are centaurs.

Alright, alright. So teams of humans and computers are much better than either of them alone. But what’s the big deal?

I believe that we're not just witnessing a technological breakthrough, we're witnessing a breakthrough in business logic. Because the technology that powers these AIs not exactly new. Yes, we're currently making a lot of progress with deep learning and ever bigger data sources. But all of these changes are mostly incremental. What these companies have understood is that you don't need artificial intelligence to be so good, so authentic, so believable that you can automate work reserved for humans and replace your mammalian workforce. It's much more effective to use AI to augment your human workers, help them do the same tasks faster, better, and with less effort.

That in turn means that business models that were hitherto unfeasible suddenly become realistic. How many people can afford an executive assistant? Even on a time-share model, exec assistants like Zirtual still run at several hundred dollars a month — far too much to reach a broad consumer market. With the cost of human labor involved in a service going down dramatically, executive assistants, personal shoppers and stylists, finance advisors, nutritionists, travel planners, social media managers, matchmakers, and many other highly specialized and individualized services will become as natural and obvious as an Amazon Prime subscription.

So let’s wrap this up. Today we learned how Siri, Cortana, and other AIs capable of understanding natural language work, and why it took us so long to get there since the humble beginnings of AI research in the 1950s. We’ve seen how hybrid AI puts a takes this to a new level and enables completely new businesses. And we’ve seen that centaurs - teams of humans and AIs - often outperform both expert humans and computers!

As a final note, I want to share a few pieces of advice for people who want to create their own Pure or Hybrid-AI products, or replace some of their human-powered processes with AI.

First, don’t. Follow Facebook’s approach: use humans to test out the market, and just pretend they’re bots. It will help you create a much better product and save you tons of money. It’s what Paul Graham would call “Do things that don’t scale”, just a little more bold.

Second, don’t simply think about how to automate things that currently require a lot of human effort. Automation is an important part in scaling companies, but it’s also the part in scaling where suddenly your quality of service goes down, you limit what your product can do, and often make bad decisions about how to limit what a product can do by what you can automate. Rather, focus on how to maximize the human output. As we have seen, humans and AI work best in teams, not adversaries. You’ll get the better results by augmenting humans, not replacing them.

Other Presentations by this Speaker