Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Tinel] Voice: The New Frontend

[Tinel] Voice: The New Frontend

Nara Kasbergen

September 05, 2019
Tweet

More Decks by Nara Kasbergen

Other Decks in Technology

Transcript

  1. Voice: The New
    Frontier
    Frontend Nara Kasbergen (@xiehan)
    #TinelMeetup
    Thursday, September 5, 2019

    View full-size slide

  2. What is NPR?
    a
    nationwide
    network of
    public radio
    stations

    View full-size slide

  3. Now:
    Why voice?
    Then:

    View full-size slide

  4. What we're covering today
    1. Vision of the future
    2. How we got here
    3. Where we are today
    4. How to build for voice today
    5. Lessons learned
    6. AMA (Ask Me Anything)

    View full-size slide

  5. Vision of the future
    1

    View full-size slide

  6. Prediction:
    In the future, we
    will all be voice
    developers.

    View full-size slide

  7. How we got here
    2

    View full-size slide

  8. Natural Language Processing
    definition:
    the ability of a computer program to understand human
    language as it is spoken. NLP is a component of artificial
    intelligence (AI). (source)
    circa 1960
    first major advances in the 1980s

    View full-size slide

  9. Machine Learning (ML)
    definition:
    Machine learning is an application of artificial intelligence (AI)
    that provides systems the ability to automatically learn and
    improve from experience without being explicitly programmed.
    Machine learning focuses on the development of computer
    programs that can access data and use it to learn for
    themselves. (source)
    circa 1950s
    first major advances in the 1980s

    View full-size slide

  10. https:/
    /hackernoon.com/moores-law-is-alive-and-well-adc010ea7a63

    View full-size slide

  11. 1994
    IBM Simon is
    the world's first
    smart phone
    1987 2009
    Microsoft
    begins work on
    Cortana
    2010
    Siri launches as a
    standalone iOS
    app, later acquired
    by Apple
    2012
    Google Now
    Apple's
    Knowledge
    Navigator

    View full-size slide

  12. 2015
    Amazon Alexa
    Skills Kit launches
    (June)
    2014 2016
    Google
    Assistant (May) +
    Google Home
    (November)
    2017
    Samsung Bixby
    (August) +
    Microsoft Cortana
    (October)
    2018
    Apple HomePod
    (February)
    Amazon Echo
    launches
    November 6

    View full-size slide

  13. “smart speakers”

    View full-size slide

  14. “smart speakers”

    View full-size slide

  15. Rise of multimodal devices

    View full-size slide

  16. Chatbots
    definition:
    a computer program or an artificial intelligence which conducts
    a conversation via auditory or textual methods. Such programs
    are often designed to convincingly simulate how a human
    would behave as a conversational partner, thereby passing the
    Turing test. (source)
    circa 1966 (ELIZA)
    first major advances in the early 2000s

    View full-size slide

  17. Google all-in on chatbot tech

    View full-size slide

  18. Where we are today
    3

    View full-size slide

  19. The major players

    View full-size slide

  20. The request/response flow
    Your
    code
    request
    response

    View full-size slide

  21. The request/response flow
    Your
    code
    request
    response
    P.S. all the NLP and ML happens here

    View full-size slide

  22. Voice for the web
    Web Speech API:
    ● Speech recognition
    ● Speech synthesis (TTS)
    W3C Community specification was published in 2012
    SpeechRecognition interface currently only supported in
    Chrome, experimental feature
    Uses Google's servers to convert speech to text (requires
    Internet connection)

    View full-size slide

  23. Mozilla CommonVoice
    "Common Voice is Mozilla's initiative to help teach machines
    how real people speak." (source)
    Publicly open dataset
    Upload recordings of your voice
    Help reduce bias in Natural Language Processing (NLP) &
    machine learning

    View full-size slide

  24. Mozilla Scout Mycroft
    Open source alternatives

    View full-size slide

  25. The enterprise voice space
    NLP/ML/AI Consumer applications

    View full-size slide

  26. How to build for voice
    today
    4

    View full-size slide

  27. disclaimer
    it's not about the code

    View full-size slide

  28. The stack
    ● node.js
    ● serverless (AWS Lambda or Google Cloud Functions)
    ● lightweight database (e.g. Amazon DynamoDB)
    ● CI server of choice (e.g. Travis, Jenkins, etc.)
    ● unit testing framework of choice (e.g. Jest)
    ● TypeScript…?

    View full-size slide

  29. Use the official SDKs
    Alexa node.js SDK:
    github.com/alexa/alexa-skills-kit-sdk-for-nodejs
    Actions on Google node.js SDK:
    github.com/actions-on-google/actions-on-google-nodejs

    View full-size slide

  30. Glossary Alexa Google / Dialogflow
    skill action / agent
    invocation name
    intent
    slot entity
    sample utterance training phrase

    View full-size slide

  31. “Alexa, ask NPR One
    to play the latest news”

    View full-size slide

  32. “Alexa, ask NPR One
    to play the latest news”
    invocation name

    View full-size slide

  33. “Alexa, ask NPR One
    what's playing”

    View full-size slide

  34. “Alexa, ask NPR One
    what's playing”
    “Alexa, ask NPR One
    what am I listening to?”

    View full-size slide

  35. “Alexa, ask NPR One
    what's playing”
    “Alexa, ask NPR One
    what am I listening to?”
    intent
    sample utterance
    sample utterance

    View full-size slide

  36. “Alexa, ask NPR One
    to play Planet Money”
    “Alexa, ask NPR One
    to play Hidden Brain”

    View full-size slide

  37. “Alexa, ask NPR One
    to play Planet Money”
    “Alexa, ask NPR One
    to play Hidden Brain”
    intent slot
    (or entity)

    View full-size slide

  38. Basic code architecture
    JSON event with intent name and slot(s)
    Handler function mapped to that intent
    Use SDK to produce a response with speech, audio, etc.

    View full-size slide

  39. Alexa "Hello World" skill

    View full-size slide

  40. Alexa "Hello World" skill

    View full-size slide

  41. Alexa "Hello World" skill

    View full-size slide

  42. Alexa "Hello World" skill

    View full-size slide

  43. Google "Hello World" action

    View full-size slide

  44. Google "Hello World" action

    View full-size slide

  45. Next, add more features
    Simple
    ● Launch requests
    ● Add intents
    ● Wait for response
    ● Handle slots
    ● Add images to cards
    ● Play simple audio
    ● Use SSML
    Medium
    ● User login (OAuth2)
    ● Persistent data
    ● State management
    ● Contexts (Google)
    ● Dialog management
    (Alexa)
    ● Advanced audio
    (Alexa)
    Advanced
    ● Customize display
    on visual devices
    ● Monetization /
    transactions
    ● Request user
    permissions
    ● Notifications
    ● Internationalization

    View full-size slide

  46. Publishing a skill/action
    ● By default, skills in development are only available to you
    ● Certification process similar to mobile app store submissions
    ○ ~48-hour turnaround on average
    ○ Feedback is unpredictable!
    ○ Respect existing brands
    ● Can share a "beta" version with co-workers, friends, etc.
    ○ Great for QA as well as hobby projects

    View full-size slide

  47. Get started without code
    Alexa:
    ● Alexa Skill Blueprints
    Google:
    ● Actions on Google Templates

    View full-size slide

  48. Lessons learned
    5

    View full-size slide

  49. The challenges
    ● The code is not hard
    ● What is hard:
    ○ Learning about the platform limitations
    ○ Managing stakeholder expectations
    ○ Understanding & changing user behaviors
    ○ QA
    ● It's not just an engineering challenge!

    View full-size slide

  50. Audio-first development
    ● You would think these platforms are ideal for audio… but
    they're not!
    ● It's clear the companies designing these platforms are still
    focused primarily on Text-to-Speech (TTS)
    ● The Actions on Google audio player is almost unusable
    ● The Alexa audio player has many features but is very
    unintuitive when you're first working with it

    View full-size slide

  51. Error handling is hard
    ● Invisible errors:
    ○ The Alexa service / Google / Dialogflow can reject a
    user's request
    ○ If that happens, your app is not notified at all!
    ○ Logs/analytics can't tell the whole story
    ○ Users often don't understand why it failed
    ● Real user testing is critical!

    View full-size slide

  52. Prediction:
    In the future, we
    will all be voice
    developers.

    View full-size slide

  53. The ideal voice developer
    ● Resilience
    ● Patience
    ● Honesty
    ● Openness
    ● Empathy
    ○ Front-end developers have an edge!

    View full-size slide

  54. AMA (Ask Me Anything)
    6

    View full-size slide

  55. Thank you!
    Keep in touch:
    @xiehan
    [email protected]

    View full-size slide