Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Voice-First iOS Apps

elainedb
October 08, 2019

Building Voice-First iOS Apps

Thanks to the latest advancements in Machine Learning, we're now capable of interacting with machines through natural language. The age of voice assistants is here with Siri, Alexa and others. But, as an iOS developer, what can I do on my existing app in relation to conversational features?
When we think about developing features that are voice-forward, we think about existing voice assistants such as Alexa and Siri. What about the fully-capable computers that we have with us all the time, our smartphones? Some moments on our day to day life are very well suited for voice interactions: while in a car or cooking for example. Let's not forget that voice interactions are extremely accessible, not only in a physical way (for people with dexterity or motion impediments) but also in a cognitive way (I think we all have a loved one in our lives that really struggles with technology, and people from some emerging countries have very limited access to computers and are not at ease with technology).

In this talk, I'll explain what integrations can be done in iOS:
- 1st-party solutions such as the Natural Language Framework and Siri Shortcuts
- 3rd-party solutions such as Porcupine, Snips, Dialogflow, Amazon Lex, RASA and many others

In summary, this talk will help think about why you should implement conversational features on your app and how.

elainedb

October 08, 2019
Tweet

More Decks by elainedb

Other Decks in Technology

Transcript

  1. @elainedbatista
    Building Voice-First iOS Apps

    View full-size slide

  2. @elainedbatista
    @elainedbatista
    Elaine Dias Batista

    View full-size slide

  3. @elainedbatista
    Why?

    View full-size slide

  4. @elainedbatista

    View full-size slide

  5. @elainedbatista

    View full-size slide

  6. @elainedbatista

    View full-size slide

  7. @elainedbatista
    What do I need to do?

    View full-size slide

  8. @elainedbatista
    Hotword / Wake Word
    detection
    Speech-to-Text /
    Speech Recognition
    NLP / NLU
    Text-to-Speech /
    Voice Synthesis

    View full-size slide

  9. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer

    View full-size slide

  10. @elainedbatista
    How?

    View full-size slide

  11. @elainedbatista
    3 strategies
    1. Integrate with an existing platform
    2. Integrate in an existing app
    3. In-house development

    View full-size slide

  12. @elainedbatista
    Integrate with an existing platform
    ➔ Voice
    ◆ Google Assistant
    ◆ Alexa
    ➔ Chat
    ◆ Facebook Messenger
    ◆ Slack
    ◆ Telegram

    View full-size slide

  13. @elainedbatista
    Integrate in an existing app
    ➔ 1st party APIs
    ➔ 3rd party SDKs

    View full-size slide

  14. @elainedbatista
    In-house development

    View full-size slide

  15. @elainedbatista
    Voice on iOS

    View full-size slide

  16. @elainedbatista 16
    2007 2009 2011 2013 2016 2019
    Voice
    O
    ver
    Speech
    / Siri
    A
    V
    SpeechSynthesizer
    SiriK
    it
    (Intents, Shortcuts)
    Speech
    Fram
    ew
    ork
    N
    L
    Fram
    ew
    ork
    Voice
    Controls
    iOS Feature
    iOS API / Framework
    C
    ore
    M
    L
    Voice Interactions on iOS

    View full-size slide

  17. @elainedbatista
    1st party solutions
    - Using APIs and Frameworks
    - Using the Intents Extension

    View full-size slide

  18. @elainedbatista
    1st party solutions
    - Using APIs and Frameworks
    - Using the Intents Extension

    View full-size slide

  19. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer

    View full-size slide

  20. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer
    Siri Shortcuts
    Speech
    Framework
    Natural Language
    Framework
    AVSpeechSynthesizer

    View full-size slide

  21. @elainedbatista
    Siri Shortcuts
    ➔ Take advantage of Siri to:
    ◆ Perform actions on your app (inside Siri)
    ◆ Open your app to a specific screen
    ➔ Integrate it by:
    ◆ Declaring an Intent Definition
    ◆ Donating the intent so Siri can learn your user's behaviors and suggest your
    shortcut
    ◆ Adding phrases to Siri with INUIAddVoiceShortcutButton (Add to Siri)
    ➔ Make your app accessible from:
    ◆ Spotlight search
    ◆ Lock screen
    ◆ Siri watch face

    View full-size slide

  22. @elainedbatista
    Siri Shortcuts
    https://www.macstories.net/stories/ios-and-ipados-13-the-macstories-review/13/

    View full-size slide

  23. @elainedbatista
    Speech Framework
    ➔ Live or prerecorded audio
    ➔ One minute limit (battery, network)
    ➔ iOS13+: supportsOnDeviceRecognition property

    View full-size slide

  24. @elainedbatista
    Natural Language Framework
    ➔ Tokenization
    ◆ Enumerates the words in a string
    ➔ Language identification
    ➔ Linguistics Tags
    ◆ Classify nouns, verbs, adjectives, and other parts of speech
    in a string.
    ◆ Use a linguistic tagger to perform named entity
    recognition on a string.

    View full-size slide

  25. @elainedbatista
    Natural Language Framework
    ➔ Text Embedding
    ➔ Natural Language Models
    ◆ Custom models: Create ML
    ● Create and train custom ML models on your Mac)
    https://developer.apple.com/documentation/createml

    View full-size slide

  26. @elainedbatista
    AVSpeechSynthesizer

    View full-size slide

  27. @elainedbatista
    AVSpeechSynthesizer
    https://nshipster.com/avspeechsynthesizer/

    View full-size slide

  28. @elainedbatista
    1st party solutions
    - Using APIs and Frameworks
    - Using the Intents Extension

    View full-size slide

  29. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer

    View full-size slide

  30. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer
    https://developer.apple.com/documentation/sirikit

    View full-size slide

  31. @elainedbatista
    Siri Domains
    https://developer.apple.com/documentation/sirikit

    View full-size slide

  32. @elainedbatista
    3rd party solutions

    View full-size slide

  33. @elainedbatista
    Hotword / Wake Word
    detection
    ● Picovoice
    ● Snowboy
    ● Snips
    ● OpenEars
    *In-App detection
    NLP / NLU
    ● Picovoice
    ● Snips
    ● OpenEars
    ● RASA NLU
    ● Tock (by Voyages SNCF)
    ● Amazon Lex
    ● IBM Watson
    ● Microsoft
    ● Wit.ai (by Facebook)
    ● Dialogflow (by Google)
    Offline
    Internet connection required
    On-premise setup available

    View full-size slide

  34. @elainedbatista
    Cross-platform advantages
    Dialogflow

    View full-size slide

  35. @elainedbatista
    Cross-platform advantages
    Picovoice

    View full-size slide

  36. @elainedbatista
    Wrapping up

    View full-size slide

  37. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer

    View full-size slide

  38. @elainedbatista


    Siri Shortcut
    Speech
    FW
    Start listening
    Snips
    Voice command
    (audio) ⇒ String
    Business
    Logic
    AVSpeech
    Synthesizer
    String ⇒ Intent
    Textual answer


    Picovoice
    Speech
    FW
    Start listening
    Open
    Ears
    Voice command
    (audio) ⇒ String
    Business
    Logic
    String ⇒ Intent
    Visual answer
    App
    closed
    App
    open

    View full-size slide

  39. @elainedbatista
    NLP
    Business
    Logic
    String ⇒ Intent
    Visual answer
    Text query


    Siri Shortcut
    Speech
    FW
    Start listening
    Voice command
    (audio) ⇒ String
    Business
    Logic
    AVSpeech
    Synthesizer
    Textual answer

    View full-size slide

  40. @elainedbatista


    Wake Word
    Detection
    STT
    Start listening
    NLP
    Voice command
    (audio) ⇒ String
    Business
    Logic
    TTS
    String ⇒ Intent
    Textual answer
    https://developer.apple.com/documentation/sirikit

    View full-size slide

  41. @elainedbatista
    Getting Started

    View full-size slide

  42. @elainedbatista
    Getting Started
    ➔ Think about your use case
    ◆ Not every use case should exist on voice
    ◆ Hands free actions (car, cooking)
    ◆ Search feature
    ➔ Think about your users and what services they're
    currently using
    ◆ If several platforms: consider a 3rd-party solution
    ◆ If mostly mobile: consider 1st party

    View full-size slide

  43. @elainedbatista
    Last Word
    ➔ This talk was about technical solutions
    ➔ You should spend a lot of time designing the
    conversations and interactions with user
    ◆ VUI/VUX

    View full-size slide

  44. @elainedbatista
    @elainedbatista
    Thanks!
    44
    @elainedbatista

    View full-size slide