Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Tinel] Voice: The New Frontend

[Tinel] Voice: The New Frontend

Nara Kasbergen

September 05, 2019
Tweet

More Decks by Nara Kasbergen

Other Decks in Technology

Transcript

  1. Voice: The New
    Frontier
    Frontend Nara Kasbergen (@xiehan)
    #TinelMeetup
    Thursday, September 5, 2019

    View Slide

  2. View Slide

  3. What is NPR?
    a
    nationwide
    network of
    public radio
    stations

    View Slide

  4. me

    View Slide

  5. Now:
    Why voice?
    Then:

    View Slide

  6. View Slide

  7. What we're covering today
    1. Vision of the future
    2. How we got here
    3. Where we are today
    4. How to build for voice today
    5. Lessons learned
    6. AMA (Ask Me Anything)

    View Slide

  8. Vision of the future
    1

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. Prediction:
    In the future, we
    will all be voice
    developers.

    View Slide

  14. How we got here
    2

    View Slide

  15. Natural Language Processing
    definition:
    the ability of a computer program to understand human
    language as it is spoken. NLP is a component of artificial
    intelligence (AI). (source)
    circa 1960
    first major advances in the 1980s

    View Slide

  16. Machine Learning (ML)
    definition:
    Machine learning is an application of artificial intelligence (AI)
    that provides systems the ability to automatically learn and
    improve from experience without being explicitly programmed.
    Machine learning focuses on the development of computer
    programs that can access data and use it to learn for
    themselves. (source)
    circa 1950s
    first major advances in the 1980s

    View Slide

  17. https:/
    /hackernoon.com/moores-law-is-alive-and-well-adc010ea7a63

    View Slide

  18. 1994
    IBM Simon is
    the world's first
    smart phone
    1987 2009
    Microsoft
    begins work on
    Cortana
    2010
    Siri launches as a
    standalone iOS
    app, later acquired
    by Apple
    2012
    Google Now
    Apple's
    Knowledge
    Navigator

    View Slide

  19. 2015
    Amazon Alexa
    Skills Kit launches
    (June)
    2014 2016
    Google
    Assistant (May) +
    Google Home
    (November)
    2017
    Samsung Bixby
    (August) +
    Microsoft Cortana
    (October)
    2018
    Apple HomePod
    (February)
    Amazon Echo
    launches
    November 6

    View Slide

  20. “smart speakers”

    View Slide

  21. “smart speakers”

    View Slide

  22. Rise of multimodal devices

    View Slide

  23. Chatbots
    definition:
    a computer program or an artificial intelligence which conducts
    a conversation via auditory or textual methods. Such programs
    are often designed to convincingly simulate how a human
    would behave as a conversational partner, thereby passing the
    Turing test. (source)
    circa 1966 (ELIZA)
    first major advances in the early 2000s

    View Slide

  24. Google all-in on chatbot tech

    View Slide

  25. View Slide

  26. Where we are today
    3

    View Slide

  27. The major players

    View Slide

  28. View Slide

  29. The request/response flow
    Your
    code
    request
    response

    View Slide

  30. The request/response flow
    Your
    code
    request
    response
    P.S. all the NLP and ML happens here

    View Slide

  31. View Slide

  32. Voice for the web
    Web Speech API:
    ● Speech recognition
    ● Speech synthesis (TTS)
    W3C Community specification was published in 2012
    SpeechRecognition interface currently only supported in
    Chrome, experimental feature
    Uses Google's servers to convert speech to text (requires
    Internet connection)

    View Slide

  33. Mozilla CommonVoice
    "Common Voice is Mozilla's initiative to help teach machines
    how real people speak." (source)
    Publicly open dataset
    Upload recordings of your voice
    Help reduce bias in Natural Language Processing (NLP) &
    machine learning

    View Slide

  34. Mozilla Scout Mycroft
    Open source alternatives

    View Slide

  35. The enterprise voice space
    NLP/ML/AI Consumer applications

    View Slide

  36. How to build for voice
    today
    4

    View Slide

  37. disclaimer
    it's not about the code

    View Slide

  38. The stack
    ● node.js
    ● serverless (AWS Lambda or Google Cloud Functions)
    ● lightweight database (e.g. Amazon DynamoDB)
    ● CI server of choice (e.g. Travis, Jenkins, etc.)
    ● unit testing framework of choice (e.g. Jest)
    ● TypeScript…?

    View Slide

  39. Use the official SDKs
    Alexa node.js SDK:
    github.com/alexa/alexa-skills-kit-sdk-for-nodejs
    Actions on Google node.js SDK:
    github.com/actions-on-google/actions-on-google-nodejs

    View Slide

  40. Glossary Alexa Google / Dialogflow
    skill action / agent
    invocation name
    intent
    slot entity
    sample utterance training phrase

    View Slide

  41. “Alexa, ask NPR One
    to play the latest news”

    View Slide

  42. “Alexa, ask NPR One
    to play the latest news”
    invocation name

    View Slide

  43. “Alexa, ask NPR One
    what's playing”

    View Slide

  44. “Alexa, ask NPR One
    what's playing”
    “Alexa, ask NPR One
    what am I listening to?”

    View Slide

  45. “Alexa, ask NPR One
    what's playing”
    “Alexa, ask NPR One
    what am I listening to?”
    intent
    sample utterance
    sample utterance

    View Slide

  46. “Alexa, ask NPR One
    to play Planet Money”
    “Alexa, ask NPR One
    to play Hidden Brain”

    View Slide

  47. “Alexa, ask NPR One
    to play Planet Money”
    “Alexa, ask NPR One
    to play Hidden Brain”
    intent slot
    (or entity)

    View Slide

  48. Basic code architecture
    JSON event with intent name and slot(s)
    Handler function mapped to that intent
    Use SDK to produce a response with speech, audio, etc.

    View Slide

  49. Alexa "Hello World" skill

    View Slide

  50. Alexa "Hello World" skill

    View Slide

  51. Alexa "Hello World" skill

    View Slide

  52. Alexa "Hello World" skill

    View Slide

  53. Google "Hello World" action

    View Slide

  54. Google "Hello World" action

    View Slide

  55. Next, add more features
    Simple
    ● Launch requests
    ● Add intents
    ● Wait for response
    ● Handle slots
    ● Add images to cards
    ● Play simple audio
    ● Use SSML
    Medium
    ● User login (OAuth2)
    ● Persistent data
    ● State management
    ● Contexts (Google)
    ● Dialog management
    (Alexa)
    ● Advanced audio
    (Alexa)
    Advanced
    ● Customize display
    on visual devices
    ● Monetization /
    transactions
    ● Request user
    permissions
    ● Notifications
    ● Internationalization

    View Slide

  56. Publishing a skill/action
    ● By default, skills in development are only available to you
    ● Certification process similar to mobile app store submissions
    ○ ~48-hour turnaround on average
    ○ Feedback is unpredictable!
    ○ Respect existing brands
    ● Can share a "beta" version with co-workers, friends, etc.
    ○ Great for QA as well as hobby projects

    View Slide

  57. Get started without code
    Alexa:
    ● Alexa Skill Blueprints
    Google:
    ● Actions on Google Templates

    View Slide

  58. Lessons learned
    5

    View Slide

  59. The challenges
    ● The code is not hard
    ● What is hard:
    ○ Learning about the platform limitations
    ○ Managing stakeholder expectations
    ○ Understanding & changing user behaviors
    ○ QA
    ● It's not just an engineering challenge!

    View Slide

  60. Audio-first development
    ● You would think these platforms are ideal for audio… but
    they're not!
    ● It's clear the companies designing these platforms are still
    focused primarily on Text-to-Speech (TTS)
    ● The Actions on Google audio player is almost unusable
    ● The Alexa audio player has many features but is very
    unintuitive when you're first working with it

    View Slide

  61. Error handling is hard
    ● Invisible errors:
    ○ The Alexa service / Google / Dialogflow can reject a
    user's request
    ○ If that happens, your app is not notified at all!
    ○ Logs/analytics can't tell the whole story
    ○ Users often don't understand why it failed
    ● Real user testing is critical!

    View Slide

  62. Prediction:
    In the future, we
    will all be voice
    developers.

    View Slide

  63. The ideal voice developer
    ● Resilience
    ● Patience
    ● Honesty
    ● Openness
    ● Empathy
    ○ Front-end developers have an edge!

    View Slide

  64. AMA (Ask Me Anything)
    6

    View Slide

  65. View Slide

  66. Thank you!
    Keep in touch:
    @xiehan
    [email protected]

    View Slide