2019 - Hearing Voices: Considering the Complexity of Vocal Interface Design

Hearing Voices Design considerations for voice interfaces Scott McCall Gretchen
McNeely

Voice hype is getting louder One in six Americans owns
a smart speaker. +128% since January 2017 (NPR and Edison Research) 50% of all searches will be voice driven by 2020 (comScore) Nearly 100 million smartphone users will be using voice assistants in 2020 (eMarketer)

How did we get here?

Voice interaction is so useful!

Zero UI and the dream of frictionless HCI Technology has
evolved enough to begin to be able to meet humans a little bit more in the middle, interaction-wise.

“Zero UI minimizes the cost to the user” Andy Goodman
Fjord By not worrying about acting like a computer, you can theoretically: • Save time and money • Be more efficient • Decrease socially deviant behaviors like staring at your iPhone

“Zero UI” is not necessarily “no UI”

Voice is a common thread in Zero UI

Voice: (Good God, Y’All) What is it good for?

How well is voice currently working? Growing in popularity, but
how well is it working? Weather, jokes, novelty quizzes, timers Strict human-to-human interaction is limited

Voice and audio are good for... Safety in a hands-
or eyes-free environment Shared interfaces for IOT or ambient devices without screens nearby (e.g. Nest) Languages that are hard to type Complicated things that people can articulate (e.g. Apple TV, “Give me thriller movies with Nicolas Cage, for free. Only show the ones with four or more stars”)

AgVoice enables hands-free crop reporting AgVoice is freeing up crop
inspectors from paperwork, keeping them mobile and efficient. • The system prompts the user, then records the inspector’s response (such as the presence of pests, the growth progress of fruit and vegetables, etc.), • It tags the data with a timestamp and geo-location, then uploads it to a secure cloud for transcription and automated report creation. http://www.agvoiceglobal.com/

Voice and audio are bad for... Anything requiring negotiation or
a lot of variables Huge amounts of input or output Input that is hard for humans to describe Comparing lists of complicated things

Google’s E Ink: Voice enhances touch Debuted at CES 2019
Shows weather, traffic, events Touch with voice allows for more useful and efficient interaction with virtual assistant

Voice design raises tough questions Conversational complexity Power dynamics Gender
and uncanny valley Voice selection control Auditory, syntax, semantics Designing for multiple audiences Hat tip: WIDD 9/24 panelists and attendees

The human challenge There’s baggage in every interaction Voice tech
may be under-equipped to handle: - Body language - Auditory (syntax and non-syntax) cues - Human conversational expectations - Power dynamics - Speed - Capability - Flexibility and pivot

Design and technical dilemmas A technical system is tasked with
understanding non-technical content - Verbal elements create roadblocks - Anger, frustration - Tone - Semantic confusion Losing visual interface means increased expectations of “humanity” Alexa: “sorry, I misunderstood” based on perceived agitation in user’s voice

Empathy for the machine What do we owe the computer
as designers? What types of relationships are we promoting? How does that influence the way humans interact with voice interfaces? - Mirror - Authority and subordinate (and vice-versa) - Peer to peer - Guide and guided - Co-captains (Michael Knight and K.I.T.T.) What does this mean for the way we manage other verbal interactions?

Gender A sticking point in voice design The “servile companion”:
- Alexa - Google Home’s female voice default - Cortana - Siri What happens when we move away from the binary? What about “uncanny valley”?

The future is female What are users conveying when they
retain default female voices? What do designers convey when we default to this? --- Why are we the keepers of that decision? What if you could roll your own?

Auditory voice considerations What about turning over control to the
user? Volume Pitch / Tuning Pace Pause Resonance Intonation / Timbre

Linguistic voice considerations Specific nature of tasks / transactions Likely
user terms -- both volunteered and recognized Semantics -- meaning Syntax -- structure Accent Dialect (accent plus syntax plus semantics) Who leads the charge on iterative design and revision?

Specialized audiences Children: how do their interactions with voice assistants
shape their interactions with other humans? Is voice a training tool? The elderly: are we considering the role of compassion and companionship for the lonely? What are the ethics behind using family member voices in assistants? Neurodiverse users: factual responses and lack of non-verbal cueing can ease the way for autistic users

The Future of Voice...May Not Be Voice Rise of a
new syntax Streamlining Voice interaction • Role of “earcons” • “Alexa Brief”: We change how we communicate in order to accommodate the computer’s needs • Also streamlining: Reduction of “wake” words for Alexa, Google to create a more seamless and efficient interaction

Replacing Voice with Audio and Tactile Cues Step away from
a human model for HCI interactions Instead, emphasize haptics and audio cues (“earcons”) Could address multiple issues: - Shifts us away from human-human conversational expectations - Greater accessibility for disabled users - Reduction in voice-based bias - More flexibility for users placed under temporary limitations

Moving away from voice Audio does not have to mean
voice… Audio “earcons,” haptics, visual cues, and physical gesture can form a communication ecosystem that reduces bias and allows optimal accessibility. As potential voice designers, maybe we won’t be designing for voice.

Thank You Questions?

2019 - Hearing Voices: Considering the Complexi...

2019 - Hearing Voices: Considering the Complexity of Vocal Interface Design

UX Y'all

More Decks by UX Y'all

Other Decks in Design

Featured

Transcript

Hearing Voices Design considerations for voice interfaces Scott McCall Gretchen

Voice hype is getting louder One in six Americans owns

How did we get here?

Voice interaction is so useful!

Zero UI and the dream of frictionless HCI Technology has

“Zero UI minimizes the cost to the user” Andy Goodman

“Zero UI” is not necessarily “no UI”

Voice is a common thread in Zero UI

Voice: (Good God, Y’All) What is it good for?

How well is voice currently working? Growing in popularity, but

Voice and audio are good for... Safety in a hands-

AgVoice enables hands-free crop reporting AgVoice is freeing up crop

Voice and audio are bad for... Anything requiring negotiation or

Google’s E Ink: Voice enhances touch Debuted at CES 2019

Voice design raises tough questions Conversational complexity Power dynamics Gender

The human challenge There’s baggage in every interaction Voice tech

Design and technical dilemmas A technical system is tasked with

Empathy for the machine What do we owe the computer

Gender A sticking point in voice design The “servile companion”:

The future is female What are users conveying when they

Auditory voice considerations What about turning over control to the

Linguistic voice considerations Specific nature of tasks / transactions Likely

Specialized audiences Children: how do their interactions with voice assistants

The Future of Voice...May Not Be Voice Rise of a

Replacing Voice with Audio and Tactile Cues Step away from

Moving away from voice Audio does not have to mean

Thank You Questions?