Building Voice-First iOS Apps

Cb3c988ada1925dbb88a2b1f11c60f3f?s=47 elainedb
October 08, 2019

Building Voice-First iOS Apps

Thanks to the latest advancements in Machine Learning, we're now capable of interacting with machines through natural language. The age of voice assistants is here with Siri, Alexa and others. But, as an iOS developer, what can I do on my existing app in relation to conversational features?
When we think about developing features that are voice-forward, we think about existing voice assistants such as Alexa and Siri. What about the fully-capable computers that we have with us all the time, our smartphones? Some moments on our day to day life are very well suited for voice interactions: while in a car or cooking for example. Let's not forget that voice interactions are extremely accessible, not only in a physical way (for people with dexterity or motion impediments) but also in a cognitive way (I think we all have a loved one in our lives that really struggles with technology, and people from some emerging countries have very limited access to computers and are not at ease with technology).

In this talk, I'll explain what integrations can be done in iOS:
- 1st-party solutions such as the Natural Language Framework and Siri Shortcuts
- 3rd-party solutions such as Porcupine, Snips, Dialogflow, Amazon Lex, RASA and many others

In summary, this talk will help think about why you should implement conversational features on your app and how.

Cb3c988ada1925dbb88a2b1f11c60f3f?s=128

elainedb

October 08, 2019
Tweet

Transcript

  1. 9.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  2. 11.

    @elainedbatista 3 strategies 1. Integrate with an existing platform 2.

    Integrate in an existing app 3. In-house development
  3. 12.

    @elainedbatista Integrate with an existing platform ➔ Voice ◆ Google

    Assistant ◆ Alexa ➔ Chat ◆ Facebook Messenger ◆ Slack ◆ Telegram
  4. 16.

    @elainedbatista 16 2007 2009 2011 2013 2016 2019 Voice O

    ver Speech / Siri A V SpeechSynthesizer SiriK it (Intents, Shortcuts) Speech Fram ew ork N L Fram ew ork Voice Controls iOS Feature iOS API / Framework C ore M L Voice Interactions on iOS
  5. 19.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  6. 20.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer Siri Shortcuts Speech Framework Natural Language Framework AVSpeechSynthesizer
  7. 21.

    @elainedbatista Siri Shortcuts ➔ Take advantage of Siri to: ◆

    Perform actions on your app (inside Siri) ◆ Open your app to a specific screen ➔ Integrate it by: ◆ Declaring an Intent Definition ◆ Donating the intent so Siri can learn your user's behaviors and suggest your shortcut ◆ Adding phrases to Siri with INUIAddVoiceShortcutButton (Add to Siri) ➔ Make your app accessible from: ◆ Spotlight search ◆ Lock screen ◆ Siri watch face
  8. 23.

    @elainedbatista Speech Framework ➔ Live or prerecorded audio ➔ One

    minute limit (battery, network) ➔ iOS13+: supportsOnDeviceRecognition property
  9. 24.

    @elainedbatista Natural Language Framework ➔ Tokenization ◆ Enumerates the words

    in a string ➔ Language identification ➔ Linguistics Tags ◆ Classify nouns, verbs, adjectives, and other parts of speech in a string. ◆ Use a linguistic tagger to perform named entity recognition on a string.
  10. 25.

    @elainedbatista Natural Language Framework ➔ Text Embedding ➔ Natural Language

    Models ◆ Custom models: Create ML • Create and train custom ML models on your Mac) https://developer.apple.com/documentation/createml
  11. 29.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  12. 30.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit
  13. 33.

    @elainedbatista Hotword / Wake Word detection • Picovoice • Snowboy

    • Snips • OpenEars *In-App detection NLP / NLU • Picovoice • Snips • OpenEars • RASA NLU • Tock (by Voyages SNCF) • Amazon Lex • IBM Watson • Microsoft • Wit.ai (by Facebook) • Dialogflow (by Google) Offline Internet connection required On-premise setup available
  14. 37.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  15. 38.

    @elainedbatista <wake word> <voice command> Siri Shortcut Speech FW Start

    listening Snips Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer String ⇒ Intent Textual answer <wake word> <voice command> Picovoice Speech FW Start listening Open Ears Voice command (audio) ⇒ String Business Logic String ⇒ Intent Visual answer App closed App open
  16. 39.

    @elainedbatista NLP Business Logic String ⇒ Intent Visual answer Text

    query <wake word> <voice command> Siri Shortcut Speech FW Start listening Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer Textual answer
  17. 40.

    @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit
  18. 42.

    @elainedbatista Getting Started ➔ Think about your use case ◆

    Not every use case should exist on voice ◆ Hands free actions (car, cooking) ◆ Search feature ➔ Think about your users and what services they're currently using ◆ If several platforms: consider a 3rd-party solution ◆ If mostly mobile: consider 1st party
  19. 43.

    @elainedbatista Last Word ➔ This talk was about technical solutions

    ➔ You should spend a lot of time designing the conversations and interactions with user ◆ VUI/VUX