Slide 1

Slide 1 text

@elainedbatista Building Voice-First iOS Apps

Slide 2

Slide 2 text

@elainedbatista @elainedbatista Elaine Dias Batista

Slide 3

Slide 3 text

@elainedbatista Why?

Slide 4

Slide 4 text

@elainedbatista

Slide 5

Slide 5 text

@elainedbatista

Slide 6

Slide 6 text

@elainedbatista

Slide 7

Slide 7 text

@elainedbatista What do I need to do?

Slide 8

Slide 8 text

@elainedbatista Hotword / Wake Word detection Speech-to-Text / Speech Recognition NLP / NLU Text-to-Speech / Voice Synthesis

Slide 9

Slide 9 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer

Slide 10

Slide 10 text

@elainedbatista How?

Slide 11

Slide 11 text

@elainedbatista 3 strategies 1. Integrate with an existing platform 2. Integrate in an existing app 3. In-house development

Slide 12

Slide 12 text

@elainedbatista Integrate with an existing platform ➔ Voice ◆ Google Assistant ◆ Alexa ➔ Chat ◆ Facebook Messenger ◆ Slack ◆ Telegram

Slide 13

Slide 13 text

@elainedbatista Integrate in an existing app ➔ 1st party APIs ➔ 3rd party SDKs

Slide 14

Slide 14 text

@elainedbatista In-house development

Slide 15

Slide 15 text

@elainedbatista Voice on iOS

Slide 16

Slide 16 text

@elainedbatista 16 2007 2009 2011 2013 2016 2019 Voice O ver Speech / Siri A V SpeechSynthesizer SiriK it (Intents, Shortcuts) Speech Fram ew ork N L Fram ew ork Voice Controls iOS Feature iOS API / Framework C ore M L Voice Interactions on iOS

Slide 17

Slide 17 text

@elainedbatista 1st party solutions - Using APIs and Frameworks - Using the Intents Extension

Slide 18

Slide 18 text

@elainedbatista 1st party solutions - Using APIs and Frameworks - Using the Intents Extension

Slide 19

Slide 19 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer

Slide 20

Slide 20 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer Siri Shortcuts Speech Framework Natural Language Framework AVSpeechSynthesizer

Slide 21

Slide 21 text

@elainedbatista Siri Shortcuts ➔ Take advantage of Siri to: ◆ Perform actions on your app (inside Siri) ◆ Open your app to a specific screen ➔ Integrate it by: ◆ Declaring an Intent Definition ◆ Donating the intent so Siri can learn your user's behaviors and suggest your shortcut ◆ Adding phrases to Siri with INUIAddVoiceShortcutButton (Add to Siri) ➔ Make your app accessible from: ◆ Spotlight search ◆ Lock screen ◆ Siri watch face

Slide 22

Slide 22 text

@elainedbatista Siri Shortcuts https://www.macstories.net/stories/ios-and-ipados-13-the-macstories-review/13/

Slide 23

Slide 23 text

@elainedbatista Speech Framework ➔ Live or prerecorded audio ➔ One minute limit (battery, network) ➔ iOS13+: supportsOnDeviceRecognition property

Slide 24

Slide 24 text

@elainedbatista Natural Language Framework ➔ Tokenization ◆ Enumerates the words in a string ➔ Language identification ➔ Linguistics Tags ◆ Classify nouns, verbs, adjectives, and other parts of speech in a string. ◆ Use a linguistic tagger to perform named entity recognition on a string.

Slide 25

Slide 25 text

@elainedbatista Natural Language Framework ➔ Text Embedding ➔ Natural Language Models ◆ Custom models: Create ML ● Create and train custom ML models on your Mac) https://developer.apple.com/documentation/createml

Slide 26

Slide 26 text

@elainedbatista AVSpeechSynthesizer

Slide 27

Slide 27 text

@elainedbatista AVSpeechSynthesizer https://nshipster.com/avspeechsynthesizer/

Slide 28

Slide 28 text

@elainedbatista 1st party solutions - Using APIs and Frameworks - Using the Intents Extension

Slide 29

Slide 29 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer

Slide 30

Slide 30 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit

Slide 31

Slide 31 text

@elainedbatista Siri Domains https://developer.apple.com/documentation/sirikit

Slide 32

Slide 32 text

@elainedbatista 3rd party solutions

Slide 33

Slide 33 text

@elainedbatista Hotword / Wake Word detection ● Picovoice ● Snowboy ● Snips ● OpenEars *In-App detection NLP / NLU ● Picovoice ● Snips ● OpenEars ● RASA NLU ● Tock (by Voyages SNCF) ● Amazon Lex ● IBM Watson ● Microsoft ● Wit.ai (by Facebook) ● Dialogflow (by Google) Offline Internet connection required On-premise setup available

Slide 34

Slide 34 text

@elainedbatista Cross-platform advantages Dialogflow

Slide 35

Slide 35 text

@elainedbatista Cross-platform advantages Picovoice

Slide 36

Slide 36 text

@elainedbatista Wrapping up

Slide 37

Slide 37 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer

Slide 38

Slide 38 text

@elainedbatista Siri Shortcut Speech FW Start listening Snips Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer String ⇒ Intent Textual answer Picovoice Speech FW Start listening Open Ears Voice command (audio) ⇒ String Business Logic String ⇒ Intent Visual answer App closed App open

Slide 39

Slide 39 text

@elainedbatista NLP Business Logic String ⇒ Intent Visual answer Text query Siri Shortcut Speech FW Start listening Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer Textual answer

Slide 40

Slide 40 text

@elainedbatista Wake Word Detection STT Start listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit

Slide 41

Slide 41 text

@elainedbatista Getting Started

Slide 42

Slide 42 text

@elainedbatista Getting Started ➔ Think about your use case ◆ Not every use case should exist on voice ◆ Hands free actions (car, cooking) ◆ Search feature ➔ Think about your users and what services they're currently using ◆ If several platforms: consider a 3rd-party solution ◆ If mostly mobile: consider 1st party

Slide 43

Slide 43 text

@elainedbatista Last Word ➔ This talk was about technical solutions ➔ You should spend a lot of time designing the conversations and interactions with user ◆ VUI/VUX

Slide 44

Slide 44 text

@elainedbatista @elainedbatista Thanks! 44 @elainedbatista