Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UXA2022_Day 1_ Lydia Penkert - Designing for everyone? Voice Interaction in public spaces

UXA2022_Day 1_ Lydia Penkert - Designing for everyone? Voice Interaction in public spaces

Have you ever wondered how the answers of voice devices or conversational robots are designed? And what should be considered while designing inclusive interactions at public spaces?

In this talk I will present a case study of an iterative interaction design approach of a voice-based service robot for a museum. You will gain an understanding about the complexity of public spaces and the challenges of inclusive designing for diverse user groups, along with helpful methods to complement your design process.

uxaustralia
PRO

August 25, 2022
Tweet

More Decks by uxaustralia

Other Decks in Design

Transcript

  1. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    www.captionslive.com.au | [email protected] | 0447 904 255
    UX Australia
    UX Australia 2022 – Hybrid Conference
    Thursday, 25 August 2022
    Captioned by: Kasey Allen & Carmel Downes

    View Slide

  2. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 20
    LARA PENIN: Thank you. Thanks for inviting me. (APPLAUSE)
    STEVE BATY: We now have a morning tea break. We are going to break
    for 30 minutes until 10.30 which I think is the same thing. Head out
    through the doors and enjoy the coffee. Check out the sponsor stands.
    Grab a Deloitte lolly bag, I saw them packing them this morning, they
    look fantastic and we will see you back here at 10.30. Thank you.
    (MORNING TEA)
    STEVE BATY: Welcome back. Come on in, grab a seat. Our next speaker
    is joining us virtually from Germany. Lydia Penkert is our next speaker
    who will be talking about designing voice interactions in public spaces.
    I'm going to apologise to Lydia because I actually thought she was in the
    US and so I scheduled her for the morning. And it's 2:30am in the
    morning. So really, really glad that she didn't, you know, get upset with
    me too much about that. I did the wrong thing, so there you go. Please
    join me in welcoming Lydia who is joining us from Germany. Thank you.
    LYDIA PENKERT: Hello everybody. I'm pleased to be here. Hello
    everybody, from Germany, like Steve said, quite late in the evening for
    me but the end of the day for you. So I hope you're having a great start
    to the conference. I will start the presentation. All right. So, hello
    everyone from my side, hope you are having a great start. Today I would
    like to invite you to the topic of voice interaction. When you think back to
    maybe last time you really used voice interaction like Siri or Alexa on your
    Smartphone, you asked a question and maybe something came back that
    didn't really feel natural, not like a normal human conversation. You may

    View Slide

  3. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 21
    have asked, "Okay, how is this actually designed or what is the process
    behind it?" So today I want to give you some insight about what is really
    behind the voice interaction, what needs to be considered when designing
    it and a use-case of a social robot which is at the Ozeaneum. I'm Lydia
    and I am working as a UX researcher at Trivago but started by PhD at the
    University of applied science in Cologne so that is where this project
    comes from. To give you some context this talk is based on a project
    called SKILLED from the University of Cologne and the aim of this protect
    is to investigate human-machine interaction for service and travelling.
    You will see different robots and different virtual Avatars which are not
    tangible and the goal is to really achieve a system for service and
    travelling where social robots can communicate with humans on an equal
    footing and provide customer service to us and in this way support service
    employees for example.
    We do have like in the context - like in service and travelling and
    this specific use-case I want to share today with you was inside a German
    museum called Ozeaneum and as the name suggests it is about the
    ocean. It is a really huge exhibition about the ocean that has fish tanks
    like this one and our goal was to provide customer service to museum
    visitors in a different way. So we put this service robot and a social robot
    where we had to convert an Ai programmed by a team and it was able to
    answer several different types of questions for example whereas where is
    the toilet and orientation inside the building but also other aspects such
    as what is the biggest animal in the ocean? So knowledge about animals
    and about the exhibition itself and also something really important in the
    domain of virtual assistants, small talk, because many users try to get the
    limits of the AI about the robot and the voice assistant and ask a question
    like, "How old are you? Why are you here? What's your name?" Like
    really getting into that and to make a conversation with a human natural

    View Slide

  4. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 22
    and that is important as a conversational AI is able to also have this type
    of conversation. So far so good? May seem a bit simple but what is the
    challenge in implementing that? So first of all the big challenge here is
    the context. So public spaces. If you think about the last time you were,
    for example, in a train station or at the airport or even looking at this
    image, what is something which stands out? The thing is there is, like,
    many different users or humans in that and that creates a lot of
    challenges. So if you think now you are standing there in the middle at a
    street festival as a concept and you want to juice voice assistant on your
    phone. Probably difficult and you will really encounter some challenges
    like maybe it doesn't understand you or it is too loud or you have other
    conversations going around and even more challenging imagine a robot
    inside this crowd providing customer service, like explaining where to go
    next. So there are definitely challenges to this environment and one of
    them I will just mention is first of all noise. So Ada is auditive and visual,
    so other conversations which are actually not belonging to the
    conversation which is going on, for example, between the voice interface
    and the user itself, but they are still there, so it is important that - from
    the technological side of voice there is a possibility to really distinguish
    between the main conversation and other conversations which are
    belonging to them, but also, like, visually. So imagine you are having
    face detection of the humans who are actually speaking and then
    somebody else just passes next to it, so somebody is in a hurry and just
    goes below the robot and the user. Then the system needs to be able to
    recap and say, "Okay, that is the same user" and not actually start all
    over again by saying, "Hello" like greeting as a new user. There needs to
    be all this capabilities of handling this noise. Along with this there is also
    a data-privacy concerns and restrictions. So, for example, of course it
    depends on the country but I mean public spaces is really difficult, real

    View Slide

  5. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 23
    y - to build and also store data by bypassers for example or visitors of the
    place without is previous consent. That imposes a lot of limitations to
    what the system is capable of doing because, for example,
    detection - motion detection, detection of a person per se as well, so that
    kind of limits what the AI is actually possible to do in that context. Along
    with this there is a lot of user acceptance of this technology in public
    spaces especially because it is something new and also some aspects
    which users may not be so comfortable with, especially in the public. So
    if you imagine you are getting a new - like a Alexa or testing out Siri for
    the first time for example, you are doing that at home, you are alone, you
    might be looking into how this actually works but once you do that in a
    public space where everybody passes next to you that is something which
    is one factor of really changing the acceptance of the technology itself and
    also something like unwanted attention from bypassers because if you
    have a robot it is new in most countries so I can guarantee that many
    people will stop and just observe an interaction and that's an effect which
    potential users might anticipate. And also looking as well again into the
    picture and thinking about the last time you were in a place like
    this - diversity of users and as well changing users. So if you go to an
    airport you may be the only ones so the AI afternoon technology is not
    really able to adopt over time to the users and also needs to be really
    adopted to the diversity of users and then the challenge starts as well.
    Okay, how to actually define diversity for AI but also for us. According to
    which personal, physical or social restriction we actually define it and
    needs to be defined in order to be able to implement that for the AI in this
    case. And when - I will give you a bit of a spoiler of what happened when
    we actually deployed this robot at the museum. You will see here one big
    family or visitor group which came altogether and already here there is
    diversity of the user, you see different ages, gender, height of the users

    View Slide

  6. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 24
    were in front of them. Maybe also different expertise or previous
    knowledge about how to actually use voice interaction and all of this
    needs to be addressed in order to really have a meaningful interaction.
    And also, additionally, it is not only one user that is diverse and can be
    changed, it is groups that are also diverse as well. It is designing for
    diversity of groups. To approach that we started by looking into all the
    data we could get from the visitors. We used the approach of who is our
    user and what do we know about them first of all. Additionally, I was also
    shadowing customer service which was interesting as well. I was sitting
    next to them and counting and looking at which type of users are actually
    coming in and our outcome was what we call some user group types. And
    also groups of what I just said because random, like, visitors would come
    along. So mostly it was group interaction, families, couples, extended
    families. When we started doing here we differentiated basically based on
    age because that was one of the things that was most easy to implement
    with our technology available and we saw that there was always, for
    example, one common user regarding the age groups. So basically we
    found out that one user type for us was adults, children of different ages,
    elderly users but mostly the inter-age. We started with adults first
    because that was the one user type which was basically common in all the
    different constellations, so no matter if it was a couple or a family or a
    group of children and an adult there was always an adult. Why we
    decided to focus on one specific and not starting with, like, all of them
    was because we wanted to take an a breach to it and time limitations as
    you all would know, sometimes there is a due date to which you really
    have to go and we looked into what really from the technology side of
    view is also feasible because what also happens from voice technology is
    that they are also definitely optimised depending on the user type. Speak
    detection for children is really difficult because there is no training data

    View Slide

  7. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 25
    for AI Australia of data privacy and concern of parents and being able to
    record children in a sense. That is already one variant and another one is
    like if you think of how maybe a five-year-old would speak, probably not
    in a correct grammar sense, so they are mixing up verbs or the order of
    the sentence and that makes it really difficult for speech detection and
    intent recognition to actually understand what the child is trying to say.
    So to come back to the topic it is actually like a huge field in itself. So we
    decided to start with adults as the first user type and looking into the
    interaction design. So we looked into three main questions which we then
    repeatedly, basically, for all different user type s. The first one was
    actually what does the robot need to know. When I say robot, you can
    say voice assistant, voice device, chat Bot, chat Bot are you voice
    interaction but the interaction is similar to what you would do with voice.
    Is second one is how should it behave. What should be the mimics, the
    personality as well and how to communicate interaction. And with
    interaction rules I mean that when we have human-human conversation
    we have sometimes where sometimes we don't even notice but we are
    able to have conversations which we actually can do, so, for example, if I
    am talking and don't make any pauses in between, you probably, like my
    partner, like my conversation partner would probably not interrupt me
    because I don't get any signals that I'm finished, however if I do like a
    pause or I do a, "What do you think?" With my mimic then that is a
    signal for my counterpart that it is his turn, his or her turn, for the
    conversation. These types of rules are also needed for a robot-human
    conversation. However, if it is maybe the first time you are actually
    speaking to a robot then it's difficult to know how you actually should
    communicate with it. So, for the knowledge part, we started can the
    reach approach of field observation and that is basically the working and
    sitting around for quite a long time and basically listening, listening to a

    View Slide

  8. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 26
    conversation between visitor, between children and their parents,
    between customer service and choice and catching basically what are the
    topics of conversation, what are questions which visitors have and
    customer service is repeatedly asked and part of that is also collecting
    interesting data. Here I discovered in the museum shop for example a
    visitor guide that was perfect because it contained questions and answers
    about the museum, so exactly the type of data we wanted to look into.
    Then the stakeholder workshop, like really participatory, looking into
    getting the expertise of exactly the customer service, what are questions
    which they keep asking to them. The outcome was content and
    knowledge base, so you can imagine that as a huge excel with a lot of
    different questions and how to phrase them and the respective answer
    and then what happens when you speak to a robot that detects the
    different words in it and the intent you are likely to have and looks into
    the database if there is anything matching and gives the respective
    answer. So basically we defined a huge question/answer table and also
    implemented some functions like saying, "Okay, which time is it actually?"
    Or "how is the weather outside?" Which required a bit more coding in
    that moment. Also it is important to actually select questions which are
    both frequent and also easy to answer. Because if you have questions
    which are - even if they are frequent but they are maybe user dependent
    or really long ones then it is also not an optimal one to really use the
    voice interface because it is also really tough to remember like a lot of
    information if I am only speaking. So visually and textually it is easy to
    read and understand that information than if the voice interface will be
    talking for three minutes about something specific. In the next step we
    look into the behaviour of the robot s. This is one of my favourite
    pictures of the whole observation there on site. Looking into how should
    the non-verbal behaviour be. Because in is a key aspect in the making of

    View Slide

  9. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 27
    interaction for humans because voice just does what I'm doing for
    example with my hands or smiling is like we need help in making our
    interaction more lifelike. As voice gestures you can imagine something
    like h'mm, m'mm or okay. Aspects which sounds don't add any meaning
    but kind of signalise that I am still in the conversation. For that we
    started with desk research, because there is a lot of research about it
    already and the outcome of this was the selection of non-verbal behaviour
    because the type of robots we used in here actually already have quite a
    lot of mimic but we started to select the ones which we found most
    appropriate in this case and also a list of proactive greetings because in a
    case like this one of the picture, you see the two children are kind of
    skeptical about what was actually happening, if they should talk to the
    robot or not and then the mother actually approached it. What should a
    robot in this case do? Should it actually ask them something and say
    hello? So that's like, that was the type of questions which we were
    looking into. When and how to actually be proactive from the robot side.
    That was also based on observations on site. That is the view actually
    from what the - the world view of the robot. So it seems basically - that
    is me standing in front and it detects there is a person inside my circle,
    which is my conversation circle and that's why I'm paying attention to
    them. That is the blue do. As you see here, I was standing kind of in
    front of the robot and quite close in that sense and its covered that users
    were standing more far away from it. That is something that was also
    conscious specific. If I am talking to you, for example, it might be more
    distance or far away than in other cultures and in that case it was also
    maybe, like, not really sure about how to interact with this technology.
    So what we noted was that users were actually far away from what we
    expected, what led to the program that the person detection was not
    detecting the users, even though they were standing there talking

    View Slide

  10. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 28
    because they would say okay, it is too far away. They are not interacting
    with me. So we extended that circle. Also, yeah, that's how - the best
    case we would say but that is actually what ends up happening with
    diverse users and user groups so basically having different users who talk
    in different terms or on the other side, the contrary, having users who
    were too close which kind of kept approaching the robot at the time more
    and more because they had the feeling it was not understanding them
    however from the background, from the technology side they were just
    too close to having an interaction which is also an interesting case. For
    this aspect, there is also the relevance of having interaction rules for
    conversation. This one the approach was the usability test and the on
    site observation. That is really the difference, that was the usability test
    in the lab and was one-to-one communication, was quiet, was fixed
    distance on how to interact while on site was totally different. And here
    what we implemented was red/green light systems that gives guidance to
    the user to really - that they would be able to know when they could talk
    and when the robot was not listening to guide more of the conversation.
    And additionally on site we then also printed the instructions and printed
    out some example questions to really start off the conversation. So to
    come back on the user Case, to focus on one small part, the adults, the
    adults in that group, there are more users to come, and here, I'm not
    sure if it is visible but the mother is actually involving a child because it
    was too small to even reach the robot so that was one aspect that was
    about inclusion about the height in that sense. However, that's
    something we actually did a bit of - we wanted to avoid children who
    actually climb and take the robot or break it so there was this mismatch
    between our - how much can we include children and still have, like, a
    functional robot afterwards. So what the team now is looking into is, like,
    really iterative design as well as implementing the learnings and test a

    View Slide

  11. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 29
    different embodiment. The next test is to test with a virtual agent. So an
    Avatar, so it could also adapt more and better to the height of users and
    actually just scale down or up depending on where they are. So that's
    also something we will investigate how the appearance and embodiment
    of such a voice can help to make it more iterative.
    To summit up, public spaces are really highly relevant for inclusion
    because that is a space where it is public, all users have access to the
    technology or the system we are building in and therefore it is really
    important to observe and it rate to keep making the product and system
    you are building more inclusive every time and therefore include
    everybody in your design and in your system. And with that, I'm really
    looking forward to your questions and thank you for your attention.
    (APPLAUSE)
    STEVE BATY: Thank you, so much. This is where it gets a little weird
    where I have to see myself. I have a question, if I may, and I haven't
    seen any question text through from the Q&A panel so apologies if I'm
    missing somebody else's question. I'm curious about how the participants
    in these interactions can be clear when there's more than one of them,
    who's being listened to. I saw some diagrams there about how the robot,
    and the interface identifies different voices but how can that feedback be
    given back to the participants? Lydia?
    LYDIA PENKERT: In that case we are also looking into how visually this
    plays out. For example, from the technical side to see that there is like a
    purple dot that for the user that it is actually paying attention to it. That
    is visibility we want to look into, but also like making it normal in that
    sense and looking into saying "hello" but starting the conversation so
    making it human and natural in the same way that a human would like to

    View Slide

  12. Note that this is an unedited transcript of a live event and therefore may contain errors. This transcript
    is the joint property of CaptionsLIVE and the authorised party responsible for payment and may not be
    copied or used by any other party without authorisation.
    Page 30
    react to that, like really, yeah, making it like a human conversation.
    STEVE BATY: I find it a fascinating sort of exercise in both the interaction
    design, which is how you've phrased it, but also the interface design and
    there are elements of both which in the context that you are designing
    are really quite complicated. It is a really nice challenge that you are
    working on?
    LYDIA PENKERT: Yeah, thank you. Definitely. It is like - like with the
    embodiment there are a lot of elements that can also already play into it.
    In inclusion, how it is perceived and how the communication actually
    flows and also from the voice aspects, which voice to choose was also like
    quite a big topic of discussion between our team because there are, like,
    many different voices like neutral ones and generalist voices which are
    interesting to look into. There are many different dimensions on it.
    STEVE BATY: My last question is around language and whether or not the
    system is able to detect different languages and respond in kind?
    LYDIA PENKERT: So right now it is mostly in German but we aim to really
    do it in different languages, also in English and any other translation but
    as languages come, really like we are specifically looking firstly into the
    German aspect and then making a translation and to other different
    countries as well.
    STEVE BATY: That is wonderful. Lydia, thank you so much.
    LYDIA PENKERT: Thanks for having me. (APPLAUSE)

    View Slide