Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Audio-First Voice Development: The good, the bad & the ugly

Audio-First Voice Development: The good, the bad & the ugly

At NPR, our interest in voice-based interfaces is obvious; they're a natural fit for our content, which has always taken an audio-first approach. Yet our expectations of voice platforms' capabilities to serve top-notch audio-first experiences have not always meshed with reality. Most of these platforms were designed primarily as vehicles for Text-to-Speech (TTS) interactions, and the ability to play audio beyond the 120 seconds or so allowed by an SSML tag comes with a distinct set of challenges, ranging from hard platform limitations to implementation quirks, documentation oversights, suboptimal user experiences, and essential features that are still missing. While we've proven that it isn't impossible to build a great audio-first experience, the journey it took to get there was almost entirely uphill.

Join a developer from NPR as she discusses the good, the bad, and the ugly of audio-first development on the Amazon Alexa and Google Assistant platforms. Along the way, she'll share her vision of what an ideal audio-first developer platform might look like.

Nara Kasbergen

July 23, 2019
Tweet

More Decks by Nara Kasbergen

Other Decks in Technology

Transcript

  1. &
    Audio-First Voice
    Development:
    The good
    The bad
    The ugly
    Nara Kasbergen (@xiehan) | NPR
    VOICE Summit | Tuesday, July 23

    View Slide

  2. Why does NPR care about voice?
    Then: Now:

    View Slide

  3. Get ready for the face-off!
    vs.__

    View Slide

  4. Scenario: Live station streams ("Play NPR")
    ◎ Individual NPR member
    stations provide live streams
    ◉ mp3 or aac audio file format,
    sometimes uses .pls or .m3u
    ◎ This skill/action helps users find
    a station & listen to the stream

    View Slide

  5. Scenario: Live station streams
    ◎ Supports streaming
    audio
    ◎ Multiple file formats:
    mp3, aac, pls, m3u…
    ◎ PlaybackFailed event
    verdict: Good!

    View Slide

  6. Scenario: Live station streams
    ◎ AoG doesn't support
    streaming audio
    ◎ Have to use a separate,
    non-public-acccess API
    called Media Actions
    verdict: Bad!

    View Slide

  7. Scenario: NPR One (continuous play)
    ◎ Audio in short segments (2-3
    minutes), mixed with podcasts
    ◉ mp3 or aac audio file format
    ◎ Continuous playlist
    ◎ Users can pause, resume, skip,
    fast-forward, rewind, mark as
    interesting, ask what's playing

    View Slide

  8. Scenario: NPR One (continuous play)
    ◎ Supports seamless autoplay
    ◎ Built-in intents for pause,
    resume, start over, skip
    ◎ Auto-resumes after TTS
    ◎ PlaybackFailed event
    verdict: Good!

    View Slide

  9. Scenario: NPR One (continuous play)
    ◎ Skill session ends when
    audio playback starts
    ◎ No built-in intents for
    fast-forward, rewind,
    "what's playing?"
    ◎ No way to do heartbeats
    verdict: Bad!

    View Slide

  10. Scenario: NPR One (continuous play)
    ◎ PlaybackNearlyFinished
    event is misnamed
    ◎ Can only queue up audio
    in response to PNF event
    verdict: Ugly!!!

    View Slide

  11. Scenario: NPR One (continuous play)
    ◎ Supports seamless autoplay*
    ◎ Built-in intents for pause,
    resume, start over
    ◎ Automatic play controls in
    the Assistant app including
    fast-forward, rewind
    verdict: Good!

    View Slide

  12. Scenario: NPR One (continuous play)
    ◎ Seamless autoplay requires a
    hack: empty mp3 SSML
    (half sec of silence)
    ◎ Only supports mp3 files
    ◎ Cannot implement rewind/
    fast-forward purely via voice
    verdict: Bad!

    View Slide

  13. Scenario: NPR One (continuous play)
    ◎ No "playback failed" event
    ◎ Audio does not auto-resume
    after an interruption
    ◎ No way to start playing audio
    from a specific offset
    verdict: Ugly!!!

    View Slide

  14. Scenario: Program on-demand (launch TBA)
    ◎ One hour of a popular program
    played on-demand
    ◎ Design ask: when the hour-long
    audio is done playing, have the
    voice assistant say: "That's all
    for now. Would you like to listen
    to your station's live stream?"
    ???

    View Slide

  15. Scenario: Program on-demand
    ◎ Cannot have Alexa speak
    (much less ask a
    question) after a finite
    audio file is done playing
    (using the audio player)
    verdict: Bad!

    View Slide

  16. Scenario: Program on-demand
    ◎ Can have Google Assistant
    speak or even ask a
    question after audio is
    done playing
    verdict: Good!

    View Slide

  17. The "Promised Land" for audio
    ◎ Lifecycle events: audio start, stop,
    finished, nearly finished, failed
    ◎ Built-in intents for all play controls
    ◎ Audio auto-resumes after TTS
    ◎ Can speak after playing audio
    ◎ Heartbeat events for analytics

    View Slide

  18. Thank you!
    Keep in touch: @xiehan
    [email protected]
    please don't contact me on Whova!

    View Slide