Audio-First Voice Development: The good, the bad & the ugly

& Audio-First Voice Development: The good The bad The ugly
Nara Kasbergen (@xiehan) | NPR VOICE Summit | Tuesday, July 23

Why does NPR care about voice? Then: Now:

Get ready for the face-off! vs.__

Scenario: Live station streams ("Play NPR") ◎ Individual NPR member
stations provide live streams ◉ mp3 or aac audio ﬁle format, sometimes uses .pls or .m3u ◎ This skill/action helps users ﬁnd a station & listen to the stream

Scenario: Live station streams ◎ Supports streaming audio ◎ Multiple
ﬁle formats: mp3, aac, pls, m3u… ◎ PlaybackFailed event verdict: Good!

Scenario: Live station streams ◎ AoG doesn't support streaming audio
◎ Have to use a separate, non-public-acccess API called Media Actions verdict: Bad!

Scenario: NPR One (continuous play) ◎ Audio in short segments
(2-3 minutes), mixed with podcasts ◉ mp3 or aac audio ﬁle format ◎ Continuous playlist ◎ Users can pause, resume, skip, fast-forward, rewind, mark as interesting, ask what's playing

Scenario: NPR One (continuous play) ◎ Supports seamless autoplay ◎
Built-in intents for pause, resume, start over, skip ◎ Auto-resumes after TTS ◎ PlaybackFailed event verdict: Good!

Scenario: NPR One (continuous play) ◎ Skill session ends when
audio playback starts ◎ No built-in intents for fast-forward, rewind, "what's playing?" ◎ No way to do heartbeats verdict: Bad!

Scenario: NPR One (continuous play) ◎ PlaybackNearlyFinished event is misnamed
◎ Can only queue up audio in response to PNF event verdict: Ugly!!!

Scenario: NPR One (continuous play) ◎ Supports seamless autoplay* ◎
Built-in intents for pause, resume, start over ◎ Automatic play controls in the Assistant app including fast-forward, rewind verdict: Good!

Scenario: NPR One (continuous play) ◎ Seamless autoplay requires a
hack: empty mp3 SSML <audio> (half sec of silence) ◎ Only supports mp3 ﬁles ◎ Cannot implement rewind/ fast-forward purely via voice verdict: Bad!

Scenario: NPR One (continuous play) ◎ No "playback failed" event
◎ Audio does not auto-resume after an interruption ◎ No way to start playing audio from a speciﬁc offset verdict: Ugly!!!

Scenario: Program on-demand (launch TBA) ◎ One hour of a
popular program played on-demand ◎ Design ask: when the hour-long audio is done playing, have the voice assistant say: "That's all for now. Would you like to listen to your station's live stream?" ???

Scenario: Program on-demand ◎ Cannot have Alexa speak (much less
ask a question) after a ﬁnite audio ﬁle is done playing (using the audio player) verdict: Bad!

Scenario: Program on-demand ◎ Can have Google Assistant speak or
even ask a question after audio is done playing verdict: Good!

The "Promised Land" for audio ◎ Lifecycle events: audio start,
stop, ﬁnished, nearly ﬁnished, failed ◎ Built-in intents for all play controls ◎ Audio auto-resumes after TTS ◎ Can speak after playing audio ◎ Heartbeat events for analytics

Thank you! Keep in touch: @xiehan [email protected] please don't contact
me on Whova!

Audio-First Voice Development: The good,the ba...

Audio-First Voice Development: The good, the bad & the ugly

Nara Kasbergen

More Decks by Nara Kasbergen

Other Decks in Technology

Featured

Transcript