Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[jsDay] Finding Your Voice: Building Screenless Interfaces with Node.js

[jsDay] Finding Your Voice: Building Screenless Interfaces with Node.js

Nara Kasbergen

May 10, 2018
Tweet

More Decks by Nara Kasbergen

Other Decks in Technology

Transcript

  1. Finding Your Voice:
    Building Screenless
    Interfaces with Node.js
    Nara Kasbergen (@xiehan) | jsDay Italy | May 10, 2018

    View full-size slide

  2. Who am I?
    ▷ Senior full-stack web developer
    ▷ At NPR since March 2014
    ▷ Part of a 5-member skunkworks
    team focused 100% on voice UI
    development
    ○ Formed in September 2017

    View full-size slide

  3. What is NPR?
    A quick explainer for the Italians in the audience:
    +

    View full-size slide

  4. Why voice UI development?
    Then: Now:

    View full-size slide

  5. “smart speakers”

    View full-size slide

  6. “smart speakers”

    View full-size slide

  7. “voice assistants”

    View full-size slide

  8. How Amazon views Alexa
    “Alexa, set the
    thermostat to
    25 degrees.”
    “Okay.”
    “I'd like to
    reorder paper
    towels please.”
    “Alexa, thank
    you!”
    “No problem.”

    View full-size slide

  9. “What would you want
    to know about
    voice UI development?”

    View full-size slide


  10. What can I actually make?

    View full-size slide


  11. Is it possible to build one app for
    Amazon Echo, Google Home,
    and Apple HomePod?

    View full-size slide

  12. To understand the present,
    we must understand the past.
    1.
    What can you
    actually make?

    View full-size slide

  13. A brief timeline of voice assistants
    2015
    Amazon Alexa
    Skills Kit launches
    (June)
    2014 2016
    Google Assistant
    (May) +
    Google Home
    (November)
    2017
    Samsung Bixby
    (August) +
    Microsoft Cortana
    (October)
    2018
    Apple HomePod
    (February)
    Amazon Echo
    launches
    November 6

    View full-size slide

  14. A natural evolution
    add voice activation
    to existing custom
    app ecosystem add content via
    RSS feeds
    add support for
    custom “skills”
    1.
    2.
    3.

    View full-size slide

  15. A natural evolution
    add voice activation
    to existing custom
    app ecosystem add content via
    RSS feeds
    add support for
    custom “skills”
    1.
    2.
    3.

    View full-size slide

  16. Conclusions
    ▷ Amazon has a 2-year lead
    ▷ Only Amazon and Google have
    fully developed ecosystems
    ▷ A big focus is adding access to
    news and podcasts via RSS
    ▷ Home automation is secondary

    View full-size slide

  17. tl;dr yes … and no
    2.
    Can you build one
    “skill” to rule them all?

    View full-size slide

  18. Alexa + Google ecosystems
    ▷ Heavily leverage their existing
    cloud infrastructure
    ○ AWS Lambda + Google Cloud Functions
    ▷ Can also build a traditional REST
    API accessed by their services

    View full-size slide

  19. The request/response flow
    Your
    code
    request
    response

    View full-size slide

  20. The request/response flow
    Your
    code
    request
    response
    P.S. all the NLP and ML happens here

    View full-size slide

  21. The future is “serverless”
    ▷ Others can speak more eloquently
    on this subject than me
    ○ Hopefully you went to Luciano's talk
    ▷ Let's just assume we want to use
    Lambda or Cloud Functions…
    ▷ … node.js wins!

    View full-size slide

  22. The official SDKs are not bad
    Alexa node.js SDK:
    github.com/alexa/alexa-skills-kit-sdk-for-nodejs
    Actions on Google node.js SDK:
    github.com/actions-on-google/actions-on-google-nodejs

    View full-size slide

  23. Examples from Alexa SDK
    responseBuilder.speak("Hello!");
    responseBuilder.reprompt("Hello?");
    responseBuilder.withSimpleCard(
    "Card Title", "Content!");
    responseBuilder.addAudioPlayerPlay
    Directive(...url);

    View full-size slide

  24. SSML: A common language

    WFUV is your station.
    There is a three second pause here
    then I continue.
    When I wake up, ="x-slow">I speak slowly.

    View full-size slide

  25. Backends-for-Frontends (BFFs)
    BFF on
    AWS
    Lambda
    The
    "real"
    API
    request
    response

    View full-size slide

  26. Two “skills”, one codebase
    BFF on
    AWS
    Lambda
    The
    "real"
    API
    BFF on
    AWS
    Lambda
    >60% shared code with
    separate view layers
    different builds using Gulp

    View full-size slide

  27. Generic Response Model

    View full-size slide

  28. Challenges
    ▷ Text-to-Speech (TtS) is still king
    ○ Google didn't even add support for their
    native audio player until February 2018
    ▷ No access to the user's location
    ▷ Error handling is interesting!
    ○ User might not even trigger your skill

    View full-size slide

  29. Conclusions
    ▷ The code is not hard
    ▷ Understanding platform limitations
    and user expectations are

    View full-size slide

  30. Open source opportunities
    ▷ Would it be helpful to have a
    formalized framework?

    View full-size slide

  31. Open source opportunities
    ▷ Would it be helpful to have a
    formalized framework?
    ○ Not really. The code is not hard.
    ▷ What we struggle with the most: QA
    ○ We need something like Selenium or
    Nightwatch.js for voice UI

    View full-size slide

  32. Thank you!
    [email protected]
    @xiehan
    https://npr.codes

    View full-size slide