a smart speaker. +128% since January 2017 (NPR and Edison Research) 50% of all searches will be voice driven by 2020 (comScore) Nearly 100 million smartphone users will be using voice assistants in 2020 (eMarketer)
Fjord By not worrying about acting like a computer, you can theoretically: • Save time and money • Be more efficient • Decrease socially deviant behaviors like staring at your iPhone
or eyes-free environment Shared interfaces for IOT or ambient devices without screens nearby (e.g. Nest) Languages that are hard to type Complicated things that people can articulate (e.g. Apple TV, “Give me thriller movies with Nicolas Cage, for free. Only show the ones with four or more stars”)
inspectors from paperwork, keeping them mobile and efficient. • The system prompts the user, then records the inspector’s response (such as the presence of pests, the growth progress of fruit and vegetables, etc.), • It tags the data with a timestamp and geo-location, then uploads it to a secure cloud for transcription and automated report creation. http://www.agvoiceglobal.com/
may be under-equipped to handle: - Body language - Auditory (syntax and non-syntax) cues - Human conversational expectations - Power dynamics - Speed - Capability - Flexibility and pivot
understanding non-technical content - Verbal elements create roadblocks - Anger, frustration - Tone - Semantic confusion Losing visual interface means increased expectations of “humanity” Alexa: “sorry, I misunderstood” based on perceived agitation in user’s voice
as designers? What types of relationships are we promoting? How does that influence the way humans interact with voice interfaces? - Mirror - Authority and subordinate (and vice-versa) - Peer to peer - Guide and guided - Co-captains (Michael Knight and K.I.T.T.) What does this mean for the way we manage other verbal interactions?
retain default female voices? What do designers convey when we default to this? --- Why are we the keepers of that decision? What if you could roll your own?
user terms -- both volunteered and recognized Semantics -- meaning Syntax -- structure Accent Dialect (accent plus syntax plus semantics) Who leads the charge on iterative design and revision?
shape their interactions with other humans? Is voice a training tool? The elderly: are we considering the role of compassion and companionship for the lonely? What are the ethics behind using family member voices in assistants? Neurodiverse users: factual responses and lack of non-verbal cueing can ease the way for autistic users
new syntax Streamlining Voice interaction • Role of “earcons” • “Alexa Brief”: We change how we communicate in order to accommodate the computer’s needs • Also streamlining: Reduction of “wake” words for Alexa, Google to create a more seamless and efficient interaction
a human model for HCI interactions Instead, emphasize haptics and audio cues (“earcons”) Could address multiple issues: - Shifts us away from human-human conversational expectations - Greater accessibility for disabled users - Reduction in voice-based bias - More flexibility for users placed under temporary limitations
voice… Audio “earcons,” haptics, visual cues, and physical gesture can form a communication ecosystem that reduces bias and allows optimal accessibility. As potential voice designers, maybe we won’t be designing for voice.