[SelfConf] Principled Product Development for Voice-Based Interfaces

[SelfConf] Principled Product Development for Voice-Based Interfaces

Voice assistants are arguably the hottest consumer technology of 2018; if your employer hasn't already asked your product team to investigate building an application for voice assistants such as Amazon Alexa, Google Home, Apple HomePod or Samsung's Bixby, they likely will soon. The tech is also finally reaching a place where this is a more realistic goal; the initial launches of the Alexa and Google Home platforms were primarily RSS feed parsers with a text-to-speech engine tacked on, but the technology has finally matured and is now able to support full-fledged custom "skills", or voice-based apps. But as with any new technology, we need to be responsible and consider the morality of what we build as we enter this new space. This talk will discuss the ethics of developing applications for voice assistants and voice-based interfaces, based on the firsthand experiences of a software engineer at the United States-based National Public Radio, as her team has spent the past year doing a deep-dive into voice UI development. The audience will learn practical strategies for respecting privacy, dealing responsibly with user data, and designing for an audience that could include children in an unsupervised setting, as well as ponder the thought-provoking implications of issues such as perpetuating gender stereotypes through voice.


Nara Kasbergen

August 17, 2018


  1. 2.
  2. 3.

    Who am I? ▪ Senior full-stack web developer ▪ At

    NPR since March 2014 ▪ Part of a 5-member skunkworks team focused 100% on voice UI development ▫ Formed in September 2017
  3. 6.

    18% of Americans own voice-activated speakers, more than doubling over

    the past year Source: NPR + Edison Research Smart Audio Report
  4. 7.
  5. 8.

    Early Adopters Early Mainstream ▪ "It's cool tech" ▪ Smart

    home ▪ Older (over half are 45+) ▪ 58% female ▪ "It's useful" ▪ Day-to-day activities ▪ Audio listening ▪ Increasing usage ▪ More engaged
  6. 9.
  7. 12.

    “These devices are evil. We should not allow big corporations

    like Amazon to spy on our every move. - my boyfriend (and others)
  8. 13.

    Our mission statement Be a visionary while guiding new product

    and service ideas from conception through launch. Along the way, partner with Member stations to create a more informed public, reaching them wherever they are.
  9. 15.

    3 core challenges on voice platforms ▪ Respect user privacy

    as much as possible ▪ Anticipate children interacting with your app unsupervised ▪ Consider gender … specifically, gender roles and voice
  10. 17.

    “Aren't these devices basically just for spying? I'd be worried

    about my girlfriend being able to find out everything I say when she's not home. - an acquaintance
  11. 18.

    The challenge: Expectations vs. Reality ▪ Nearly all these devices

    have a "mute" button ▫ You can use WireShark to verify that nothing said while muted is sent to e.g. Amazon ▪ Very little data is stored on the device; virtually everything is processed & stored in the cloud ▪ Communication happens over HTTPS
  12. 19.

    The challenge: Expectations vs. Reality ▪ The developer platform is

    heavily sandboxed ▫ As a developer, I have no control over what happens if my app is not in active use ▫ I get no information about e.g. how many people are in the room, which room, etc. ▫ Platforms use a strict permission system
  13. 20.

    The status quo: Alexa ▪ Gives users a random alphanumeric

    ID ▫ Per device, per install: i.e. if a user uninstalls the app, then reinstalls it, the ID resets! ▪ Allows login via OAuth 2.0 ▫ Access token gets added to requests after successful authorization
  14. 21.

    The status quo: Google Assistant ▪ Currently assigns a random

    user ID, but that is deprecated and will be removed June 1, 2019 ▪ Strongly encourages login (OAuth 2.0 or Google) ▫ Access token gets added to requests after successful authorization ▪ Non-signed-in users are subject to voice match
  15. 22.

    Example scenario: Station search A user is using the "Play

    NPR" skill on Alexa for the first time. They are asked to choose a member station in their area. They specify the location "Seattle, Washington" and are given 3 stations to choose from. The user selects KUOW, which we save as their default station for return visits.
  16. 23.

    Short-term data Long-term data ▪ Most recent search query (e.g.

    "Seattle, Washington") ▪ The search results ▪ The last thing the voice assistant said/asked ▪ The user's default station ▪ The number of times the user has accessed the app
  17. 24.

    Our approach ▪ Use a 24-hour TTL on short-term user

    data ▪ Don't ask for login where it's not required ▪ When using login, don't store account data ▫ Use access token to retrieve on demand ▪ Remove all remaining user data when a user uninstalls the app
  18. 25.

    The challenge: Analytics ▪ One of our biggest business requirements

    is analytics. We can't avoid it. ▪ We decided to use Google Analytics ▫ Analytics providers for voice are just emerging ▪ Recommendation: hash the user's ID before sending data to analytics provider
  19. 26.

    GDPR: Sensible privacy guidelines ▪ Be aware of what personal

    data means ▪ Hold and process data only if it is absolutely necessary for the completion of a task ▪ Have a process in place for erasing a user's data at their request
  20. 28.

    “ These devices are particularly appealing to parents/families, and that

    continues to be the case, with adoption growing more quickly among that segment These devices are particularly appealing to parents/families, and that continues to be the case, with adoption growing more quickly among that segment. - NPR + Edison Research Smart Audio Report
  21. 29.

    43% of Early Mainstream parents purchased the speaker to reduce

    screen time Source: NPR + Edison Research Smart Audio Report
  22. 30.

    Example scenario A parent is listening to NPR One on

    Alexa while their 5-year-old child is playing in the room. They have to step out for a while to make a phone call. When they come back, Alexa is playing an episode of a podcast featuring curse words.
  23. 31.

    Our approach ▪ We play a content warning before the

    start of a story if it features e.g. strong language or disturbing content ▪ Always make it easy to skip or stop playing audio; don't interfere with the user's choice ▪ As a side note, users must sign up for an NPR account in order to use this app
  24. 32.

    If you do want to serve this audience… ▪ Do

    not require login ▪ Avoid retaining user data longer than necessary ▪ Simplify your voice interactions; kids are not going to remember complex commands ▪ Reward good behavior like saying "please" and "thank you"
  25. 34.

    What's in a name? Gendered ▪ Alexa (female) ▪ Cortana

    (female) ▪ Siri (female) ▪ Bixby (male) Neutral ▪ Google Assistant
  26. 35.
  27. 36.
  28. 37.

    The status quo: Alexa ▪ Provides only one (female-sounding) voice

    at both the device/user and app level ▫ 8 new app-level voices are part of an opt-in Developer Preview program ▪ Can be renamed ("wake word") ▫ Amazon, Echo, Computer
  29. 38.

    The status quo: Google Assistant ▪ Designed to be gender-neutral

    ▪ Provides 8 voice options at the device/user level (not labelled by gender, offers a range of voices) ▫ Defaults to a female-sounding voice ▪ Provides 4 voice options at the app level ▫ Defaults to male ("Male 1")
  30. 39.
  31. 40.
  32. 41.

    “I use the male voice because I have two daughters.

    And they should know that voice assistants can present as male or female. - Ha-Hoa Hamano (product manager)
  33. 42.
  34. 43.

    ▪ … We're still working it out ▪ Alexa basically

    locks you into a female voice ▪ Should we care about cross-platform consistency? ▪ The dilemma: NPR has already received its fair share of criticism for putting too many cis male voices on the air Our approach
  35. 44.

    ▪ W3C standard supported on all major platforms ▪ Use

    <audio> to embed short, pre-recorded audio A solution: SSML <audio>
  36. 45.

    My recommendations ▪ Where possible, use <audio> instead of TTS

    ▫ Record real voices representative of the diversity of your user base ▪ Otherwise, be conscious of perpetuating gender roles and stereotypes ▫ Don't make the assistant overly subservient
  37. 47.

    Links to research, talks & blog posts ▪ NPR +

    Edison Research Smart Audio Report ▪ Finding Your Voice: Building Screenless Interfaces with Node.js (my talk at jsDay 2018) ▪ Talking Back To Your Radio: How We Approached Voice UI (npr.design) ▪ How To Prototype For Audio-Rich Voice Experiences Without Really Trying (npr.design)
  38. 48.

    Other useful resources ▪ Alexa Skill Blueprints (Amazon) ▪ Conversation

    design (Google) ▪ Designing Voice Experiences (Smashing Mag) ▪ Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri (Nieman Norman Group)
  39. 49.

    General recommendations ▪ Don't embark on this work without a

    designer ▪ Think of it as another form of front-end design ▪ Test with real users ▪ Remember being in the home is a privilege ▪ Empower everyone on the team to speak up and weigh in on design decisions