Building Voice-First iOS Apps

@elainedbatista Building Voice-First iOS Apps

@elainedbatista @elainedbatista Elaine Dias Batista

@elainedbatista Why?

@elainedbatista

@elainedbatista What do I need to do?

@elainedbatista Hotword / Wake Word detection Speech-to-Text / Speech Recognition
NLP / NLU Text-to-Speech / Voice Synthesis

@elainedbatista <wake word> <voice command> Wake Word Detection STT Start
listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer

@elainedbatista How?

@elainedbatista 3 strategies 1. Integrate with an existing platform 2.
Integrate in an existing app 3. In-house development

@elainedbatista Integrate with an existing platform ➔ Voice ◆ Google
Assistant ◆ Alexa ➔ Chat ◆ Facebook Messenger ◆ Slack ◆ Telegram

@elainedbatista Integrate in an existing app ➔ 1st party APIs
➔ 3rd party SDKs

@elainedbatista In-house development

@elainedbatista Voice on iOS

@elainedbatista 16 2007 2009 2011 2013 2016 2019 Voice O
ver Speech / Siri A V SpeechSynthesizer SiriK it (Intents, Shortcuts) Speech Fram ew ork N L Fram ew ork Voice Controls iOS Feature iOS API / Framework C ore M L Voice Interactions on iOS

@elainedbatista 1st party solutions - Using APIs and Frameworks -
Using the Intents Extension

listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer Siri Shortcuts Speech Framework Natural Language Framework AVSpeechSynthesizer

@elainedbatista Siri Shortcuts ➔ Take advantage of Siri to: ◆
Perform actions on your app (inside Siri) ◆ Open your app to a speciﬁc screen ➔ Integrate it by: ◆ Declaring an Intent Deﬁnition ◆ Donating the intent so Siri can learn your user's behaviors and suggest your shortcut ◆ Adding phrases to Siri with INUIAddVoiceShortcutButton (Add to Siri) ➔ Make your app accessible from: ◆ Spotlight search ◆ Lock screen ◆ Siri watch face

@elainedbatista Siri Shortcuts https://www.macstories.net/stories/ios-and-ipados-13-the-macstories-review/13/

@elainedbatista Speech Framework ➔ Live or prerecorded audio ➔ One
minute limit (battery, network) ➔ iOS13+: supportsOnDeviceRecognition property

@elainedbatista Natural Language Framework ➔ Tokenization ◆ Enumerates the words
in a string ➔ Language identiﬁcation ➔ Linguistics Tags ◆ Classify nouns, verbs, adjectives, and other parts of speech in a string. ◆ Use a linguistic tagger to perform named entity recognition on a string.

@elainedbatista Natural Language Framework ➔ Text Embedding ➔ Natural Language
Models ◆ Custom models: Create ML • Create and train custom ML models on your Mac) https://developer.apple.com/documentation/createml

@elainedbatista AVSpeechSynthesizer

@elainedbatista AVSpeechSynthesizer https://nshipster.com/avspeechsynthesizer/

@elainedbatista 1st party solutions - Using APIs and Frameworks -
Using the Intents Extension

listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit

@elainedbatista Siri Domains https://developer.apple.com/documentation/sirikit

@elainedbatista 3rd party solutions

@elainedbatista Hotword / Wake Word detection • Picovoice • Snowboy
• Snips • OpenEars *In-App detection NLP / NLU • Picovoice • Snips • OpenEars • RASA NLU • Tock (by Voyages SNCF) • Amazon Lex • IBM Watson • Microsoft • Wit.ai (by Facebook) • Dialogﬂow (by Google) Ofﬂine Internet connection required On-premise setup available

@elainedbatista Cross-platform advantages Dialogﬂow

@elainedbatista Cross-platform advantages Picovoice

@elainedbatista Wrapping up

@elainedbatista <wake word> <voice command> Siri Shortcut Speech FW Start
listening Snips Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer String ⇒ Intent Textual answer <wake word> <voice command> Picovoice Speech FW Start listening Open Ears Voice command (audio) ⇒ String Business Logic String ⇒ Intent Visual answer App closed App open

@elainedbatista NLP Business Logic String ⇒ Intent Visual answer Text
query <wake word> <voice command> Siri Shortcut Speech FW Start listening Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer Textual answer

listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit

@elainedbatista Getting Started

@elainedbatista Getting Started ➔ Think about your use case ◆
Not every use case should exist on voice ◆ Hands free actions (car, cooking) ◆ Search feature ➔ Think about your users and what services they're currently using ◆ If several platforms: consider a 3rd-party solution ◆ If mostly mobile: consider 1st party

@elainedbatista Last Word ➔ This talk was about technical solutions
➔ You should spend a lot of time designing the conversations and interactions with user ◆ VUI/VUX

@elainedbatista @elainedbatista Thanks! 44 @elainedbatista

Building Voice-First iOS Apps

Building Voice-First iOS Apps

More Decks by elainedb

Other Decks in Technology

Featured

Transcript