UXA2022_Day 1_ Lydia Penkert - Designing for everyone? Voice Interaction in public spaces

Note that this is an unedited transcript of a live
event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. www.captionslive.com.au | [email protected] | 0447 904 255 UX Australia UX Australia 2022 – Hybrid Conference Thursday, 25 August 2022 Captioned by: Kasey Allen & Carmel Downes

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 20 LARA PENIN: Thank you. Thanks for inviting me. (APPLAUSE) STEVE BATY: We now have a morning tea break. We are going to break for 30 minutes until 10.30 which I think is the same thing. Head out through the doors and enjoy the coffee. Check out the sponsor stands. Grab a Deloitte lolly bag, I saw them packing them this morning, they look fantastic and we will see you back here at 10.30. Thank you. (MORNING TEA) STEVE BATY: Welcome back. Come on in, grab a seat. Our next speaker is joining us virtually from Germany. Lydia Penkert is our next speaker who will be talking about designing voice interactions in public spaces. I'm going to apologise to Lydia because I actually thought she was in the US and so I scheduled her for the morning. And it's 2:30am in the morning. So really, really glad that she didn't, you know, get upset with me too much about that. I did the wrong thing, so there you go. Please join me in welcoming Lydia who is joining us from Germany. Thank you. LYDIA PENKERT: Hello everybody. I'm pleased to be here. Hello everybody, from Germany, like Steve said, quite late in the evening for me but the end of the day for you. So I hope you're having a great start to the conference. I will start the presentation. All right. So, hello everyone from my side, hope you are having a great start. Today I would like to invite you to the topic of voice interaction. When you think back to maybe last time you really used voice interaction like Siri or Alexa on your Smartphone, you asked a question and maybe something came back that didn't really feel natural, not like a normal human conversation. You may

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 21 have asked, "Okay, how is this actually designed or what is the process behind it?" So today I want to give you some insight about what is really behind the voice interaction, what needs to be considered when designing it and a use-case of a social robot which is at the Ozeaneum. I'm Lydia and I am working as a UX researcher at Trivago but started by PhD at the University of applied science in Cologne so that is where this project comes from. To give you some context this talk is based on a project called SKILLED from the University of Cologne and the aim of this protect is to investigate human-machine interaction for service and travelling. You will see different robots and different virtual Avatars which are not tangible and the goal is to really achieve a system for service and travelling where social robots can communicate with humans on an equal footing and provide customer service to us and in this way support service employees for example. We do have like in the context - like in service and travelling and this specific use-case I want to share today with you was inside a German museum called Ozeaneum and as the name suggests it is about the ocean. It is a really huge exhibition about the ocean that has fish tanks like this one and our goal was to provide customer service to museum visitors in a different way. So we put this service robot and a social robot where we had to convert an Ai programmed by a team and it was able to answer several different types of questions for example whereas where is the toilet and orientation inside the building but also other aspects such as what is the biggest animal in the ocean? So knowledge about animals and about the exhibition itself and also something really important in the domain of virtual assistants, small talk, because many users try to get the limits of the AI about the robot and the voice assistant and ask a question like, "How old are you? Why are you here? What's your name?" Like really getting into that and to make a conversation with a human natural

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 22 and that is important as a conversational AI is able to also have this type of conversation. So far so good? May seem a bit simple but what is the challenge in implementing that? So first of all the big challenge here is the context. So public spaces. If you think about the last time you were, for example, in a train station or at the airport or even looking at this image, what is something which stands out? The thing is there is, like, many different users or humans in that and that creates a lot of challenges. So if you think now you are standing there in the middle at a street festival as a concept and you want to juice voice assistant on your phone. Probably difficult and you will really encounter some challenges like maybe it doesn't understand you or it is too loud or you have other conversations going around and even more challenging imagine a robot inside this crowd providing customer service, like explaining where to go next. So there are definitely challenges to this environment and one of them I will just mention is first of all noise. So Ada is auditive and visual, so other conversations which are actually not belonging to the conversation which is going on, for example, between the voice interface and the user itself, but they are still there, so it is important that - from the technological side of voice there is a possibility to really distinguish between the main conversation and other conversations which are belonging to them, but also, like, visually. So imagine you are having face detection of the humans who are actually speaking and then somebody else just passes next to it, so somebody is in a hurry and just goes below the robot and the user. Then the system needs to be able to recap and say, "Okay, that is the same user" and not actually start all over again by saying, "Hello" like greeting as a new user. There needs to be all this capabilities of handling this noise. Along with this there is also a data-privacy concerns and restrictions. So, for example, of course it depends on the country but I mean public spaces is really difficult, real

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 23 y - to build and also store data by bypassers for example or visitors of the place without is previous consent. That imposes a lot of limitations to what the system is capable of doing because, for example, detection - motion detection, detection of a person per se as well, so that kind of limits what the AI is actually possible to do in that context. Along with this there is a lot of user acceptance of this technology in public spaces especially because it is something new and also some aspects which users may not be so comfortable with, especially in the public. So if you imagine you are getting a new - like a Alexa or testing out Siri for the first time for example, you are doing that at home, you are alone, you might be looking into how this actually works but once you do that in a public space where everybody passes next to you that is something which is one factor of really changing the acceptance of the technology itself and also something like unwanted attention from bypassers because if you have a robot it is new in most countries so I can guarantee that many people will stop and just observe an interaction and that's an effect which potential users might anticipate. And also looking as well again into the picture and thinking about the last time you were in a place like this - diversity of users and as well changing users. So if you go to an airport you may be the only ones so the AI afternoon technology is not really able to adopt over time to the users and also needs to be really adopted to the diversity of users and then the challenge starts as well. Okay, how to actually define diversity for AI but also for us. According to which personal, physical or social restriction we actually define it and needs to be defined in order to be able to implement that for the AI in this case. And when - I will give you a bit of a spoiler of what happened when we actually deployed this robot at the museum. You will see here one big family or visitor group which came altogether and already here there is diversity of the user, you see different ages, gender, height of the users

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 24 were in front of them. Maybe also different expertise or previous knowledge about how to actually use voice interaction and all of this needs to be addressed in order to really have a meaningful interaction. And also, additionally, it is not only one user that is diverse and can be changed, it is groups that are also diverse as well. It is designing for diversity of groups. To approach that we started by looking into all the data we could get from the visitors. We used the approach of who is our user and what do we know about them first of all. Additionally, I was also shadowing customer service which was interesting as well. I was sitting next to them and counting and looking at which type of users are actually coming in and our outcome was what we call some user group types. And also groups of what I just said because random, like, visitors would come along. So mostly it was group interaction, families, couples, extended families. When we started doing here we differentiated basically based on age because that was one of the things that was most easy to implement with our technology available and we saw that there was always, for example, one common user regarding the age groups. So basically we found out that one user type for us was adults, children of different ages, elderly users but mostly the inter-age. We started with adults first because that was the one user type which was basically common in all the different constellations, so no matter if it was a couple or a family or a group of children and an adult there was always an adult. Why we decided to focus on one specific and not starting with, like, all of them was because we wanted to take an a breach to it and time limitations as you all would know, sometimes there is a due date to which you really have to go and we looked into what really from the technology side of view is also feasible because what also happens from voice technology is that they are also definitely optimised depending on the user type. Speak detection for children is really difficult because there is no training data

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 25 for AI Australia of data privacy and concern of parents and being able to record children in a sense. That is already one variant and another one is like if you think of how maybe a five-year-old would speak, probably not in a correct grammar sense, so they are mixing up verbs or the order of the sentence and that makes it really difficult for speech detection and intent recognition to actually understand what the child is trying to say. So to come back to the topic it is actually like a huge field in itself. So we decided to start with adults as the first user type and looking into the interaction design. So we looked into three main questions which we then repeatedly, basically, for all different user type s. The first one was actually what does the robot need to know. When I say robot, you can say voice assistant, voice device, chat Bot, chat Bot are you voice interaction but the interaction is similar to what you would do with voice. Is second one is how should it behave. What should be the mimics, the personality as well and how to communicate interaction. And with interaction rules I mean that when we have human-human conversation we have sometimes where sometimes we don't even notice but we are able to have conversations which we actually can do, so, for example, if I am talking and don't make any pauses in between, you probably, like my partner, like my conversation partner would probably not interrupt me because I don't get any signals that I'm finished, however if I do like a pause or I do a, "What do you think?" With my mimic then that is a signal for my counterpart that it is his turn, his or her turn, for the conversation. These types of rules are also needed for a robot-human conversation. However, if it is maybe the first time you are actually speaking to a robot then it's difficult to know how you actually should communicate with it. So, for the knowledge part, we started can the reach approach of field observation and that is basically the working and sitting around for quite a long time and basically listening, listening to a

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 26 conversation between visitor, between children and their parents, between customer service and choice and catching basically what are the topics of conversation, what are questions which visitors have and customer service is repeatedly asked and part of that is also collecting interesting data. Here I discovered in the museum shop for example a visitor guide that was perfect because it contained questions and answers about the museum, so exactly the type of data we wanted to look into. Then the stakeholder workshop, like really participatory, looking into getting the expertise of exactly the customer service, what are questions which they keep asking to them. The outcome was content and knowledge base, so you can imagine that as a huge excel with a lot of different questions and how to phrase them and the respective answer and then what happens when you speak to a robot that detects the different words in it and the intent you are likely to have and looks into the database if there is anything matching and gives the respective answer. So basically we defined a huge question/answer table and also implemented some functions like saying, "Okay, which time is it actually?" Or "how is the weather outside?" Which required a bit more coding in that moment. Also it is important to actually select questions which are both frequent and also easy to answer. Because if you have questions which are - even if they are frequent but they are maybe user dependent or really long ones then it is also not an optimal one to really use the voice interface because it is also really tough to remember like a lot of information if I am only speaking. So visually and textually it is easy to read and understand that information than if the voice interface will be talking for three minutes about something specific. In the next step we look into the behaviour of the robot s. This is one of my favourite pictures of the whole observation there on site. Looking into how should the non-verbal behaviour be. Because in is a key aspect in the making of

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 27 interaction for humans because voice just does what I'm doing for example with my hands or smiling is like we need help in making our interaction more lifelike. As voice gestures you can imagine something like h'mm, m'mm or okay. Aspects which sounds don't add any meaning but kind of signalise that I am still in the conversation. For that we started with desk research, because there is a lot of research about it already and the outcome of this was the selection of non-verbal behaviour because the type of robots we used in here actually already have quite a lot of mimic but we started to select the ones which we found most appropriate in this case and also a list of proactive greetings because in a case like this one of the picture, you see the two children are kind of skeptical about what was actually happening, if they should talk to the robot or not and then the mother actually approached it. What should a robot in this case do? Should it actually ask them something and say hello? So that's like, that was the type of questions which we were looking into. When and how to actually be proactive from the robot side. That was also based on observations on site. That is the view actually from what the - the world view of the robot. So it seems basically - that is me standing in front and it detects there is a person inside my circle, which is my conversation circle and that's why I'm paying attention to them. That is the blue do. As you see here, I was standing kind of in front of the robot and quite close in that sense and its covered that users were standing more far away from it. That is something that was also conscious specific. If I am talking to you, for example, it might be more distance or far away than in other cultures and in that case it was also maybe, like, not really sure about how to interact with this technology. So what we noted was that users were actually far away from what we expected, what led to the program that the person detection was not detecting the users, even though they were standing there talking

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 28 because they would say okay, it is too far away. They are not interacting with me. So we extended that circle. Also, yeah, that's how - the best case we would say but that is actually what ends up happening with diverse users and user groups so basically having different users who talk in different terms or on the other side, the contrary, having users who were too close which kind of kept approaching the robot at the time more and more because they had the feeling it was not understanding them however from the background, from the technology side they were just too close to having an interaction which is also an interesting case. For this aspect, there is also the relevance of having interaction rules for conversation. This one the approach was the usability test and the on site observation. That is really the difference, that was the usability test in the lab and was one-to-one communication, was quiet, was fixed distance on how to interact while on site was totally different. And here what we implemented was red/green light systems that gives guidance to the user to really - that they would be able to know when they could talk and when the robot was not listening to guide more of the conversation. And additionally on site we then also printed the instructions and printed out some example questions to really start off the conversation. So to come back on the user Case, to focus on one small part, the adults, the adults in that group, there are more users to come, and here, I'm not sure if it is visible but the mother is actually involving a child because it was too small to even reach the robot so that was one aspect that was about inclusion about the height in that sense. However, that's something we actually did a bit of - we wanted to avoid children who actually climb and take the robot or break it so there was this mismatch between our - how much can we include children and still have, like, a functional robot afterwards. So what the team now is looking into is, like, really iterative design as well as implementing the learnings and test a

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 29 different embodiment. The next test is to test with a virtual agent. So an Avatar, so it could also adapt more and better to the height of users and actually just scale down or up depending on where they are. So that's also something we will investigate how the appearance and embodiment of such a voice can help to make it more iterative. To summit up, public spaces are really highly relevant for inclusion because that is a space where it is public, all users have access to the technology or the system we are building in and therefore it is really important to observe and it rate to keep making the product and system you are building more inclusive every time and therefore include everybody in your design and in your system. And with that, I'm really looking forward to your questions and thank you for your attention. (APPLAUSE) STEVE BATY: Thank you, so much. This is where it gets a little weird where I have to see myself. I have a question, if I may, and I haven't seen any question text through from the Q&A panel so apologies if I'm missing somebody else's question. I'm curious about how the participants in these interactions can be clear when there's more than one of them, who's being listened to. I saw some diagrams there about how the robot, and the interface identifies different voices but how can that feedback be given back to the participants? Lydia? LYDIA PENKERT: In that case we are also looking into how visually this plays out. For example, from the technical side to see that there is like a purple dot that for the user that it is actually paying attention to it. That is visibility we want to look into, but also like making it normal in that sense and looking into saying "hello" but starting the conversation so making it human and natural in the same way that a human would like to

event and therefore may contain errors. This transcript is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be copied or used by any other party without authorisation. Page 30 react to that, like really, yeah, making it like a human conversation. STEVE BATY: I find it a fascinating sort of exercise in both the interaction design, which is how you've phrased it, but also the interface design and there are elements of both which in the context that you are designing are really quite complicated. It is a really nice challenge that you are working on? LYDIA PENKERT: Yeah, thank you. Definitely. It is like - like with the embodiment there are a lot of elements that can also already play into it. In inclusion, how it is perceived and how the communication actually flows and also from the voice aspects, which voice to choose was also like quite a big topic of discussion between our team because there are, like, many different voices like neutral ones and generalist voices which are interesting to look into. There are many different dimensions on it. STEVE BATY: My last question is around language and whether or not the system is able to detect different languages and respond in kind? LYDIA PENKERT: So right now it is mostly in German but we aim to really do it in different languages, also in English and any other translation but as languages come, really like we are specifically looking firstly into the German aspect and then making a translation and to other different countries as well. STEVE BATY: That is wonderful. Lydia, thank you so much. LYDIA PENKERT: Thanks for having me. (APPLAUSE)

UXA2022_Day 1_ Lydia Penkert - Designing for ev...

UXA2022_Day 1_ Lydia Penkert - Designing for everyone? Voice Interaction in public spaces

uxaustralia

More Decks by uxaustralia

Other Decks in Design

Featured

Transcript

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live

Note that this is an unedited transcript of a live