Building Voice-First iOS Apps

Cb3c988ada1925dbb88a2b1f11c60f3f?s=47 elainedb
October 08, 2019

Building Voice-First iOS Apps

Thanks to the latest advancements in Machine Learning, we're now capable of interacting with machines through natural language. The age of voice assistants is here with Siri, Alexa and others. But, as an iOS developer, what can I do on my existing app in relation to conversational features?
When we think about developing features that are voice-forward, we think about existing voice assistants such as Alexa and Siri. What about the fully-capable computers that we have with us all the time, our smartphones? Some moments on our day to day life are very well suited for voice interactions: while in a car or cooking for example. Let's not forget that voice interactions are extremely accessible, not only in a physical way (for people with dexterity or motion impediments) but also in a cognitive way (I think we all have a loved one in our lives that really struggles with technology, and people from some emerging countries have very limited access to computers and are not at ease with technology).

In this talk, I'll explain what integrations can be done in iOS:
- 1st-party solutions such as the Natural Language Framework and Siri Shortcuts
- 3rd-party solutions such as Porcupine, Snips, Dialogflow, Amazon Lex, RASA and many others

In summary, this talk will help think about why you should implement conversational features on your app and how.

Cb3c988ada1925dbb88a2b1f11c60f3f?s=128

elainedb

October 08, 2019
Tweet

Transcript

  1. @elainedbatista Building Voice-First iOS Apps

  2. @elainedbatista @elainedbatista Elaine Dias Batista

  3. @elainedbatista Why?

  4. @elainedbatista

  5. @elainedbatista

  6. @elainedbatista

  7. @elainedbatista What do I need to do?

  8. @elainedbatista Hotword / Wake Word detection Speech-to-Text / Speech Recognition

    NLP / NLU Text-to-Speech / Voice Synthesis
  9. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  10. @elainedbatista How?

  11. @elainedbatista 3 strategies 1. Integrate with an existing platform 2.

    Integrate in an existing app 3. In-house development
  12. @elainedbatista Integrate with an existing platform ➔ Voice ◆ Google

    Assistant ◆ Alexa ➔ Chat ◆ Facebook Messenger ◆ Slack ◆ Telegram
  13. @elainedbatista Integrate in an existing app ➔ 1st party APIs

    ➔ 3rd party SDKs
  14. @elainedbatista In-house development

  15. @elainedbatista Voice on iOS

  16. @elainedbatista 16 2007 2009 2011 2013 2016 2019 Voice O

    ver Speech / Siri A V SpeechSynthesizer SiriK it (Intents, Shortcuts) Speech Fram ew ork N L Fram ew ork Voice Controls iOS Feature iOS API / Framework C ore M L Voice Interactions on iOS
  17. @elainedbatista 1st party solutions - Using APIs and Frameworks -

    Using the Intents Extension
  18. @elainedbatista 1st party solutions - Using APIs and Frameworks -

    Using the Intents Extension
  19. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  20. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer Siri Shortcuts Speech Framework Natural Language Framework AVSpeechSynthesizer
  21. @elainedbatista Siri Shortcuts ➔ Take advantage of Siri to: ◆

    Perform actions on your app (inside Siri) ◆ Open your app to a specific screen ➔ Integrate it by: ◆ Declaring an Intent Definition ◆ Donating the intent so Siri can learn your user's behaviors and suggest your shortcut ◆ Adding phrases to Siri with INUIAddVoiceShortcutButton (Add to Siri) ➔ Make your app accessible from: ◆ Spotlight search ◆ Lock screen ◆ Siri watch face
  22. @elainedbatista Siri Shortcuts https://www.macstories.net/stories/ios-and-ipados-13-the-macstories-review/13/

  23. @elainedbatista Speech Framework ➔ Live or prerecorded audio ➔ One

    minute limit (battery, network) ➔ iOS13+: supportsOnDeviceRecognition property
  24. @elainedbatista Natural Language Framework ➔ Tokenization ◆ Enumerates the words

    in a string ➔ Language identification ➔ Linguistics Tags ◆ Classify nouns, verbs, adjectives, and other parts of speech in a string. ◆ Use a linguistic tagger to perform named entity recognition on a string.
  25. @elainedbatista Natural Language Framework ➔ Text Embedding ➔ Natural Language

    Models ◆ Custom models: Create ML • Create and train custom ML models on your Mac) https://developer.apple.com/documentation/createml
  26. @elainedbatista AVSpeechSynthesizer

  27. @elainedbatista AVSpeechSynthesizer https://nshipster.com/avspeechsynthesizer/

  28. @elainedbatista 1st party solutions - Using APIs and Frameworks -

    Using the Intents Extension
  29. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  30. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit
  31. @elainedbatista Siri Domains https://developer.apple.com/documentation/sirikit

  32. @elainedbatista 3rd party solutions

  33. @elainedbatista Hotword / Wake Word detection • Picovoice • Snowboy

    • Snips • OpenEars *In-App detection NLP / NLU • Picovoice • Snips • OpenEars • RASA NLU • Tock (by Voyages SNCF) • Amazon Lex • IBM Watson • Microsoft • Wit.ai (by Facebook) • Dialogflow (by Google) Offline Internet connection required On-premise setup available
  34. @elainedbatista Cross-platform advantages Dialogflow

  35. @elainedbatista Cross-platform advantages Picovoice

  36. @elainedbatista Wrapping up

  37. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer
  38. @elainedbatista <wake word> <voice command> Siri Shortcut Speech FW Start

    listening Snips Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer String ⇒ Intent Textual answer <wake word> <voice command> Picovoice Speech FW Start listening Open Ears Voice command (audio) ⇒ String Business Logic String ⇒ Intent Visual answer App closed App open
  39. @elainedbatista NLP Business Logic String ⇒ Intent Visual answer Text

    query <wake word> <voice command> Siri Shortcut Speech FW Start listening Voice command (audio) ⇒ String Business Logic AVSpeech Synthesizer Textual answer
  40. @elainedbatista <wake word> <voice command> Wake Word Detection STT Start

    listening NLP Voice command (audio) ⇒ String Business Logic TTS String ⇒ Intent Textual answer https://developer.apple.com/documentation/sirikit
  41. @elainedbatista Getting Started

  42. @elainedbatista Getting Started ➔ Think about your use case ◆

    Not every use case should exist on voice ◆ Hands free actions (car, cooking) ◆ Search feature ➔ Think about your users and what services they're currently using ◆ If several platforms: consider a 3rd-party solution ◆ If mostly mobile: consider 1st party
  43. @elainedbatista Last Word ➔ This talk was about technical solutions

    ➔ You should spend a lot of time designing the conversations and interactions with user ◆ VUI/VUX
  44. @elainedbatista @elainedbatista Thanks! 44 @elainedbatista