Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strategies for Conversational Product Experiences

UXAustralia
August 30, 2019

Strategies for Conversational Product Experiences

UXAustralia

August 30, 2019
Tweet

More Decks by UXAustralia

Other Decks in Design

Transcript

  1. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 PHILLIP HUNTER: How is everyone? Can you hear me? Yes, OK. Make sure we get all the right connections here. There we go, OK. Thank you, Kate. I've known Steve for a while and Kate and I first met in Seattle when she came to speak at a conference and I've gotten to watch her discussions of humanity emerge over the years, over the past couple of years, and I was really excited to be following here today because meaning is really the core of what I want to talk about too and the meaning inherent in humanity. So we can see the slides. So let's talk about being human. So those of us who are blessed, or potentially cursed, with the ability to speak - talk at about 16,000 words a day, raise your hands if that seems high to you? Raise your hand if that seems really low? A few of you. Good, the honest ones. Not all of those are consequential but we do use a lot of words every day. In addition to that, teens and adults exchange anywhere from 30 to 100SMS and app-type messages. So Facebook Messenger and the like. During our 3.5 days that we spend interacting with those devices. And then those of us who are, again, blessed, perhaps, to use email, have to interact with about 100 or more messages a day. Conversation is a very, very big part of our lives. Most of us, I think, would acknowledge that without conversation, we would accomplish very little. At the same time, conversation is not something - again, a show of hands. How many of you feel confident in every conversation that you are going into with a loved one? I want you to come up here. Going into conversation with a loved one, a boss, someone you've had a disagreement with, right? These are situations that we may feel confident at this point in our lives and some of us may have worked very hard on that. At the same time, I would guess that most of you, even if you feel confident, you don't necessarily always know why you're a able to make it happen. Some of us have read books around conversations and having tough ones, but it doesn't necessarily mean that, you know, we try to learn those lessons and then we practice them but it doesn't always mean that we know exactly how it happens. Conversation is very critical, and I bring this all up to talk about conversational interface s not because we need machines to converse with us all the time, but because we are beings of meaning, we are beings of connection. We are beings that request and deliver information and when we interact with the machines in a conversational way, and by the way, when I say conversational, I don't mean just voice, I mean text. I have worked in speech recognition applications for a long time but I also work with chatbots and they look at things like messenger and Apple Business Chat and other mediums for exchanging conversations with people between companies and people. And so conversation is really about the richness of the things that we are communicating back and forth. We negotiate, we relate, we talk just to connect and often there's no more reason than just to connect, that we have a conversation with someone. It can be the reason we say good morning or it could be the reason why we talk about the weather, small talk, things like that.
  2. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 2 of 10 So conversation, to take us even a further step back, the conversation is the human interface. It is how we interface with each other. When we communicate we are, with language, we are adopt ing something that most of us believe we share to some degree, which is an interface. The same thing that we build in our digital worlds. Pardon me while I try to read what's in front of me. Meaning requires multiple levels of context. Most of us, we're designers, we know this. We work with context. We work with content. We work with crafting flow. However, we don't often think about what makes up that meaning. It's culture, it's self-owned knowledge, it's impuded knowledge that we assume someone else has, it's tonal quality to our voice, gestures, visual, other visual signals, the environment that we're in. All of those things bring meaning to each and every conversation that we have. We use conversation to engage and fulfil social contracts. I had many conversations about coming to Sydney to speak to you today. We have many conversations about getting work done. Those of us who are in long-term committed relationships will have many conversation s about that relationship. Meaning goes deep and broad. Meaning shows up in diverse and wide-meaning ways. It shows up as story, persuasion, negotiation, teaching, comfort, poetry, debate, explanation, commiseration, resolution, play, inquiry, defence, entertainment, just a partial list of all of these things. And when we use it, it's not just recognition of words, of static definitions, it's detailed, it's complex, it's subtle, it's sophisticated, it's conceptual, it's inferential. The reason I'm talking to you today is our machines aren't doing that with us. We are ignoring these things by and large in our machines. You can see a quote here from about the idea of mutual responsibility in conversation. The participants are trying to establish at the initiation of each new interaction, each new contribution, the mutual belief that we understand each other, that we know what we're talking about. How many of you have had a conversation one day with someone and walked away from it thinking wow, they really got me, we were aligned on a course of action, we know what we're going to do. A week later the thing that gets done is not what you agreed on and you go back to them and you all of a sudden it becomes clear that you had two entirely different understandings of what happened. That happen to anyone? Raise your hand. Yeah, more than a few of us. It's human. When we think about machines doing that, machines that are not necessarily built around meaning. Machines that are not designed or applications that are not designed to do that negotiation, think of all that we aren't even trying to do there. When we have to work so hard with each other, and our machines aren't even built yet to do this. And another example of why this is so important, and I don't usually like to - I take that back, I always have too much text on my slides. This is a Twitter exchange I came across just yesterday. It's remarkable this showed up, I was so pleased. A woman who is a speech language pathologist, meaning that she works with often children but sometimes adults as well on language issues, and not just - often the mechanics but not just the mechanics. More importantly the delivery and reception of meaning.
  3. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 3 of 10 So you can scan this but essentially she was on a flight, she sat next to a father who spoke a different language than her, and his son, and his son had severe language pathology s. It's so cool because she gave the gift of meaning to a little boy. This is our desire, right, as humans, is to find meaning, like Kate said, to communicate meaning. It's not easy. We'll talk about an example of that. Has anyone ever spoken these words or heard them spoken to them? "I really need to see you." A few of you? Probably more. If you see this in a text message in someone that you are romantically involved with and it's got this emoji, you're thinking this could be a lovely evening. However, if a few minutes later you check your email and it comes from your manager, maybe you're definitely going to need that evening because your afternoon is not going to be so great. On a more serious note, if you get this in a voice message from your doctor, this could be life changing. The same words, simple, relatively simple, well understood gist of context and our conversational technology is bypassing these sorts of subtleties, not taking these things into account right now. However, we're after that. We want to go there and yes, there are so many things around that that, and I'm sure you've heard of the issues with Amazon and Apple and Google listening to recordings and not controlling as well, so privacy and sanctity of data remains serious issues to take care of. At the same time we aren't taking care of the human issues of connecting around meaning. Now, just a in case you're not familiar with these underlying bits of the technology. An utterance, when someone says something to a machine we call it an utterance. It's a form of input, similar to opening an app or pressing a button to start something. It's the indication of the willingness to act. The prompt is what the system presents back, an output. It could be the classic definition of prompt where it's a question, soliciting an input. Or it could be giving information, presenting the weather in this case. Intent is something that we gather using the natural English processing. It's an indication of what this person wants, either as a direct statement. OK, Google, give me weather and traffic, or it could be indirect, OK Google, what's my morning information? Maybe I've set that up already or maybe Google knows from patterns of behaviour prior to that. A slot is a variable detail. Could say give me weather and traffic and Google will interpret that as today. You could say give me weather for tomorrow, weather for next Wednesday. Those are slots, variables. And then a turn is the pair, an input, an output or in some cases an output and then an input. This is what we do with each other as well. So I don't point this out. I'm not going to go into a great deal of technical detail of how these systems work but it's important to understand that they're mimicking what we do with each other already. We walk up to someone and we ask them a question or we give them a piece of information. Now, the interesting thing about talking
  4. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 4 of 10 about all this, too, is that we're changing the definition of conversation from something that happens between two or more individuals, two or more humans into things that are inanimate, the Alexa echo. Fred Allan, an old US comic writer wrote a bit about this. SPEAKER: Tell me what are your feelings about the... SPEAKER: I ... SPEAKER: You have a radio? SPEAKER: I had a one. PHILLIP HUNTER: There's a character there that says I don't hold with furniture that talks. My furniture is not supposed to make noises' me. As we heard yesterday, the strange can be strange until all of a sudden we're used to it and all of a sudden it's useful to us or it's acceptable. Now we have these devices that are in our homes for some of us, or many of us have them on our phones as well and they begin to have usefulness to us or at least some desirability. So some of you may be familiar with American game show Jeopardy, which is a quiz show where there's a special rule that you have to answer, you have to give the answer in the form of a question. So if I say something like what's the only continent that's also a country, you would say what is Australia. And it's been a big hit for Amazon Alexa, primarily because the show is a big hit and had a very devoted following so they developed what is called an Alexa skill around that so people could play just like the game is played. So Alexa will present six answers to you and you have to respond in the form of a question and the game will disallow your answer if you don't respond that way. So it's programmed to do that. And people love it. Any Jeopardy fans out there? I can't even really - OK, a few. So if you are a Jeopardy fan, you know that game, when you're watching it, it is not watch ed silently, right? You're yelling your answers at the TV, convinced that Alex Tribec can hear you as well if only you had a buzzer. So this was a runaway hit from day one when it came out because it lets fans play just like on the television and it's aligned with that day's television content, even more importantly. And the highest rate of game play is immediately following the showing in the local area. Google Home Hundred hub, someone told me, it's been renamed the Nest Hub, which just begins to mix all sorts of interesting metaphors together. It's designed to sort of capitalise on a number of factors, like the iPhone, it's not doing anything net new but it's bringing things together in a way that hadn't been brought together before and I know a bunch of the folks at Google who work on these sorts of things and they do a lot of work to bring that together in a way that makes sense and that helps make a home feel more productive and things like that. So what does it mean to design for this? When you're a designer of conversational interactions, the first thing that you realise is that you're anticipating one half of a conversation. You don't know how - you're guessing at how the other person is going to respond. And, now, and people are - speech recognition is a non-deterministic technology, meaning everything is a guess, it's
  5. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 5 of 10 maybe a very, very well-educated and well-refined guess but it's still a guess. There is no such thing as 100% accurate recognition. Even for very small domain systems. Now, one of the reasons that is is that people speak in a non-deterministic way. I could start spouting jokes, dad jokes up here, which I make many, and they're all terrible. You might laugh at them but you would be wondering why has Phillip changed this tact and taken this thing into a very bad comedy routine. But people do that frequently in these systems. We design them for one thing believing a set of usage will occur and then people will use them in different ways, and I don't mean abusive ways, but things will occur to them to say that we didn't anticipate and then we have to revise that. But your job is to design that one half of that conversation. And it is a collaborative negotiation as we talked about a moment ago where we're trying our best to see what we can have the application understand. I'm not going to go through the rest of these but you can see that some of these things are very similar to what you already think about in terms of user experience design. Some of these , like, an exceptional appreciation of the variation of language might stretch your boundaries of comfort. I know a fair number of designers, sadly, who aren't comfortable writing, who aren't comfortable communicating with words. They prefer the lines and the colour, which are all important. But when it comes to designing voice, designing conversation, you have to learn how to deal with words and, yeah. So what I want to get into to take you through are five technical strategies for conversation. Now what I mean by technical strategies is you can think about this in terms of product, you can think about this in terms of approaches that you might take. I'm going to give you some examples of these. I'm going to give you some practical tips. I'm only going to get you - I'm not even going to get you - I'm definitely not throwing you into the deep end, maybe I'm just splashing you with a bit of speech and conversational thinking here. But we'll go through these. So the first one here is obviously a play on Steve Krueg's book. There's a tendency we have when we design these systems to think that the person who is using it is thinking. Now, it's not necessarily that we're trying to make them think but we do think that they're thinking. But we don't, really. We don't actually think much when we talk. Now that might seem odd to some of you and make perfect sense to others of you. Part of the challenge is that we're really good at this. We're really good at conversation and basically at the same time as someone else is saying something else to us and we are processing that information in milliseconds, we are also constructing our response also in milliseconds, otherwise, every conversation would be like, hey, did you see the weather today? "Yes." What about it?" "Well, I noticed it was raining." "You did? What about that?" But we don't do that, right, we're back and forth, we're quick. The thing is that when we construct these systems, though, when we interact with someone else, I should say, we're pretty sure, through getting to know them, even in milliseconds, even in a few seconds, even meeting someone for the first time, we're pretty sure of the assumptions that we make about what that person can and can't do. With systems we don't have that. We
  6. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 6 of 10 don't know what the system is and isn't aware of. So one of the challenges we have is, as designers, is keeping track of what's happening and understanding that the other person - the person's also trying to keep track of what's happening. In the end, we're left with silence before the next step. So you saw the words flash on the screen, that may seem unusual. I'm not going to show them to you again because with speech, once it happens, it's gone unless we ask someone to repeat the . If we do that enough they get annoyed and walk away. Now our speech systems don't, fortunate ly, have the gift of getting annoyed. They get to do whatever we tell them to do but at the same time when we're presenting information with a speech system, we're presenting it to people who are dealing with potential memory overload or distraction. We're dealing with people who have a need for predictability, for familiarity, and then we also have to think about the fact that we have to be flexible with content and context. And we have to realise that conversation requires constant maintenance and repair. Repair is a fascinating topic. If you've interacted with a speech system you will have harder it -- heard it say "I didn't get that." Did you say?" It's all stilted and it is because of the poor way of doing those sorts of things. However we do them in conversation all the time, we call them repair and it's part of that negotiation. It could be something like "I didn't catch what you just said" or "I kind of get what you're saying but if you can give me an example or say it in a different way, I want to try to make sure I understand it." This repair happens about every 84 seconds in almost every language that is spoken on Earth. This study came out of a university here in Australia and it's such a crucial part and such a common part of our conversation. We don't even usually know that we're doing it. We just do it. We have very, very highly crafted, highly skilled, highly practised ways of doing that. But our speech systems aren't designed that way yet. But we need them to be so we need to think about how do we design in ways that provide clarity about what to do and why. We need to plan for misinterpretation, we need to plan for spontaneous changes of direction. We need to be careful with open-ended prompts. You might have encountered systems that say "How may I help you" and certainly Alexa and Google are, by definition, open-ended because they're sitting there passesively until you address them. Here's something we did recently at the company I work for where we had trouble with "How may I help you" because how may I help you invites almost anything. And one of the things it was inviting here was "I want to talk to a person." I have no problem getting someone to a person but at the same time automation and its benefits and I won't go into that but one of the things I talked about with our team was well, we're asking this question that has nothing to do with actually inquiring what they're calling about. We're just saying "How may I help you". It's a standard greeting. I'm splitting hairs but it's an important hair. So changing just subtly to "What are you calling about today" all of a sudden I'm call to go talk to a person doesn't ring true because I'm not. I'm calling to get to a problem.
  7. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 7 of 10 Now, so that's this idea of, you know, let's help people through the process. Don't make them think - don't assume that they're thinking. There's also the human connection and for those of you who saw Tripp's talk yesterday, we don't have any other way of relating to things that aren't us except to have inanimate objects and anmat -- animate objects and on top of that we really apply the, let's just say it this way, we love other people. I know some of you are probably misenthropes. But we do crave human presence. And we apply these cravings to pretty much everything. Raise your hand if you've never yelled at your computer. OK, I thought so. Now raise your hand if your computer yelled back. Right. So we know this, right? We don't expect the computer to yell back but we do this, we treat everything as a person, we put faces on toast, all of these sorts of things that aren't real but that's just the way we're wired. It kept us alive, it kept us evolving. We just meet for coffee and we talk and we want to exchange things and we have little conversations like the coffee's so good here. It really is. Now, there's seven words in that conversation and those seven words are replacing all of these words. This is what's really being talked about in this conversation. And actually that's an incomplete list. I wanted it to be somewhat legible. I'm not going to read that to you. Imagine what we get into when it's more complicated, more complex, more involved. Decades ago, a guy by the last name of Grice observed why we replace that long list with those seven words and so he had these maxums. He observed that we engage in quality, truthfulness, honesty, clarity. We engage with quantity, just not enough information. We engage with relevance, things that make sense in the moment. And then we engage with manner, that's appropriate between the parties that are speaking. And if you have children, if you've ever taught, ever interacted with a child as an adult and you know you talk to the child differently and the child is not necessarily aware of that or they may say "That person is doing grown up talk right now" especially depending on what the language is coming out of the person's mouth. The idea here is that there are things that make effective conversation and Grice observed these for us. Now, we also have principles and guidelines that we can apply to our - to our designs and so things like brevity, focus, ease, helpfulness, these are important things for us to say how do we create the kinds of conversations that we would like to have and that other people will find useful, other people will find meaningful. Now, a third thing here is something that's very near and dear to my heart right now. It's the idea that our systems don't treat us well, these conversational systems. We could say the same is true of many applications, many websites, but when it's voice, when it's conversation, it feels more personal. Conversation is intimate. Conversation is addressed to us. When it's an application we can just say "I hate that app." We don't necessarily feel offended by that app but if something says something to us, especially if it's a machine in our home that we expect to help us, then it becomes very difficult. I'm going to jump out of this for a moment and show you all something that hopefully I kept in
  8. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 8 of 10 the right spot. No, of course I didn't, why would I do that? There it is. Do we know why I wouldn't have audio right now? (Video plays) SPEAKER: Alexa play a song by the (inaudible). SPEAKER: (Bleep). PHILLIP HUNTER: These folks are amazing. They're extremely patient. I worked at Alexa. I probably would have put that device in the fireplace after about half of what these people dealt with. Now, I show you that because a place that I find a great deal of frustration is when I see articles like these. I've been asked about this at this conference and I know that some of your -- of you are parents and are parents worried about their children becoming rude because of a device in their home where it's hard to say you need to say please and thank you. Sure enough, in response to articles like this, Alexa and Google have introduced things pretty please, and I can't remember the one that's on Alexa where it requires you to say please to get something. Now I hate that. I just want to be really clear. It's not that I'm anti-politeness, I taught my children to be polite, caring towards people who were treating them well. Now, I don't mean - I don't necessarily mean that. You should be polite to a person even if they're being rude to you. However, my point is these systems are not designed to be good to us. I'm not saying they're designed to be bad to us, but they're not specifically designed to be good to us. So my response to these sorts of things right now is the applications we get right now are neither good people nor good machines. So I'm not going to worry about being polite to something until I see that the level of care for me is also being given and I can tell you it's not there. Now, there are hard constraints here, like time limits for attention, input duration, things like that, but there are also things like hard-to-discover offerings. I don't know what a system or a machine or application necessarily is capable of. I don't see anything. It's not telling me what it can do. There's invizable narrowing. I issue a command and all of a sudden I could do fewer things than I could before but I don't know that. And then there are just the terrible ways that failure is handled and you've all experienced it. We talked about it earlier. If you say "Alexa, why?" "I don't know what you're referring to" or "I can't handle that right now" or things like that. Sure, that's a weird thing to say but there are basic conversations that we can have with each other and my fiancee is travelling here with me and she has gotten to know what I'm sure you're all familiar with as the designer sigh. I'm on a website and I'm on an application and all of a sudden (Sighs). "What's wrong, baby?" "I can't - you would - if I could strangle a product manager right now, I would." Now, I'm not saying I design perfectly, I've made plenty of bad sessions. In fact I've been in this field so long I made some of the original mistakes. I'm trying to learn from them and do better. But handling failure is critical and turning failure into not failure is critical and we have to do that before we can really say the machines are treating us well.
  9. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 9 of 10 Then lastly, the idea that our systems and applications tend to be designed in a very MVP style as is true in many parts of our digital world, but with voice, with conversation, there's almost a mandate, the system has to be smarter than the situation it's handling. You think about this. It's sort of the idea of a container and we don't know - I can see this glass and I could know that I could probably fill up - I could probably pour an entire one of those bottles of water into it. But I would know that I can't pour more than probably more than one, certainly not two complete bottles. But we don't know that with these invisible systems. We don't know what they're capable of and they should be capable of two bottles of water if they need to be, if that's our situation. So that's something that needs to happen. We can know this, we can design for this, we can ask the right questions about the people who will be using these systems and these are some of the questions, there are many more, you are all UX professionals, these are not unusual questions to ask but we don't see them asked well enough and followed up well enough yet in our current state of voice technology. And then there's some things around even what we ask. So once we have these answers what do we do with them? Perhaps it's, you know, we ask questions of it, we interrogate, we critique. Does this prompt sound like a human would say it spontaneously? I can tell you that the system requires your account number to be put into it is not a natural way to say what's your 10-digit account number? Do we design things to be conscientious. If it's not conscientious, if it doesn't feel like it could be spontaneous, how do we change it? Does the prompt connect? I hear so many of these systems where each prompt sounds like the prior one was never spoken, or, you know, in other words, we say things like, or the systems say things like "In order for me to access your account, what's your 10-digit account number" blah, blah, blah. "I will now access your account using your 10-digit account number." What did I think was going to happen? Was it necessary to say that? Another thing that happened at home, I tried to get a recording of this, and I apologise. We play a lot of music on our devices and our device tends to respond in a very wordy way in the same way every time "Now playing such and such", "Playing on such and such device", "Playing from Amazon music". I'm not going to tell you the company. So recently, I gave one of these queries and before that long thing, they added the term "Got it". I thought well got it's cool but you can't say the rest of it now. Got it means got it. What else does it mean? You don't say got it and then explain what you got. I mean, then don't say got it. Anyway, OK, a couple of things, a couple more things. Language is alive. We need to understand that language changes. How many of you would consider - now be brave - how many would consider yourself a grammarian? Several of you, great. I have bad news for you. Your time is limited. What do I mean by that? I'm not saying - grammar is great because grammar helps us with a lot of things around meaning. You know, it's the whole Oxford comma debate, and your Uncle Jack, you probably heard that one. What I'm saying is language is dynamic, it changes, it moves, it emerges from different groups.
  10. UX Australia 2019 (AUUXAU3008D) Main Room, Day 2 – 30th

    August, 2019 Page 10 of 10 What you think of as misuse is not actual misuse, it's not laziness, rudeness, signs of an impending apocalypse. Words can die, structures can die, structures can change, new words are amazing. Change is a sign of richness. Gretchen McCulloch published a book recently, she's an Internet linguist and she's saying to the world look at how amazingly evolved we are as a species by analysing the language. Even LOLs and the OMGs, it's all really amazing. One of the most amazing - other amazing things that I came across recently how many of you are familiar with Mr Rogers, or not familiar? Mr Rogers is a world wide phenomenon, probably the best thing America ever did, seriously. He had this set of rules that his writers called frettish about how to phrase things to children. I looked at these and I thought these are amazing, they're beautiful, they're lovely. So I adapted them to say how could we write better prompts, better actions for people using machines? Now these are not perfect but they're a start and what really encourages us is iteration towards better. There's not just a fixed way of saying any one thing. So revise and imbue care and thoughtfulness. The last thing I'll bring up is, yay though I talk in the uncanny valley of the shadow of the -- of the Death Star. One of my favourite things when I watched the 'Star Wars' movies was to make that noise. The dialogue was so terrible. What I encourage you to do is to take everything that I've said, but also realise that this is a craft, this is a - how many of you are visual designers and you studied hard and went to art school and you loved the subtlies of a gradient and the perfect rounded corner and all that. Do the same with voice, do the same with conversation, words, craft. Yesterday we talked about deep fakery things and that's important and the more we get - the closer we get to realism, sometimes you get worried about whether this is going to be - will people even know - Google duplex, will they be fool snd -- fooled? It's not about that. It's about us feeling treated like a person, that we can behave like a person and we don't have to learn a set of different rules or learn a set of different rules. So bring in the storytellers, bring in the language crafters, the screenwriters, build things people love and want to interact with. Thank you. (Applause)