[jsDay] Finding Your Voice: Building Screenless Interfaces with Node.js

Slide 1

Slide 1 text

Finding Your Voice: Building Screenless Interfaces with Node.js Nara Kasbergen (@xiehan) | jsDay Italy | May 10, 2018

Slide 2

Slide 2 text

Who am I? ▷ Senior full-stack web developer ▷ At NPR since March 2014 ▷ Part of a 5-member skunkworks team focused 100% on voice UI development ○ Formed in September 2017

Slide 3

Slide 3 text

What is NPR? A quick explainer for the Italians in the audience: +

Slide 4

Slide 4 text

Why voice UI development? Then: Now:

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

“smart speakers”

Slide 7

Slide 7 text

“smart speakers”

Slide 8

Slide 8 text

“voice assistants”

Slide 9

Slide 9 text

How Amazon views Alexa “Alexa, set the thermostat to 25 degrees.” “Okay.” “I'd like to reorder paper towels please.” “Alexa, thank you!” “No problem.”

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

“What would you want to know about voice UI development?”

Slide 12

Slide 12 text

“ What can I actually make?

Slide 13

Slide 13 text

“ Is it possible to build one app for Amazon Echo, Google Home, and Apple HomePod?

Slide 14

Slide 14 text

To understand the present, we must understand the past. 1. What can you actually make?

Slide 15

Slide 15 text

A brief timeline of voice assistants 2015 Amazon Alexa Skills Kit launches (June) 2014 2016 Google Assistant (May) + Google Home (November) 2017 Samsung Bixby (August) + Microsoft Cortana (October) 2018 Apple HomePod (February) Amazon Echo launches November 6

Slide 16

Slide 16 text

A natural evolution add voice activation to existing custom app ecosystem add content via RSS feeds add support for custom “skills” 1. 2. 3.

Slide 17

Slide 17 text

A natural evolution add voice activation to existing custom app ecosystem add content via RSS feeds add support for custom “skills” 1. 2. 3.

Slide 18

Slide 18 text

Conclusions ▷ Amazon has a 2-year lead ▷ Only Amazon and Google have fully developed ecosystems ▷ A big focus is adding access to news and podcasts via RSS ▷ Home automation is secondary

Slide 19

Slide 19 text

tl;dr yes … and no 2. Can you build one “skill” to rule them all?

Slide 20

Slide 20 text

Alexa + Google ecosystems ▷ Heavily leverage their existing cloud infrastructure ○ AWS Lambda + Google Cloud Functions ▷ Can also build a traditional REST API accessed by their services

Slide 21

Slide 21 text

The request/response flow Your code request response

Slide 22

Slide 22 text

The request/response flow Your code request response P.S. all the NLP and ML happens here

Slide 23

Slide 23 text

The future is “serverless” ▷ Others can speak more eloquently on this subject than me ○ Hopefully you went to Luciano's talk ▷ Let's just assume we want to use Lambda or Cloud Functions… ▷ … node.js wins!

Slide 24

Slide 24 text

The official SDKs are not bad Alexa node.js SDK: github.com/alexa/alexa-skills-kit-sdk-for-nodejs Actions on Google node.js SDK: github.com/actions-on-google/actions-on-google-nodejs

Slide 25

Slide 25 text

Examples from Alexa SDK responseBuilder.speak("Hello!"); responseBuilder.reprompt("Hello?"); responseBuilder.withSimpleCard( "Card Title", "Content!"); responseBuilder.addAudioPlayerPlay Directive(...url);

Slide 26

Slide 26 text

SSML: A common language WFUV is your station. There is a three second pause here then I continue. When I wake up, I speak slowly.

Slide 27

Slide 27 text

Backends-for-Frontends (BFFs) BFF on AWS Lambda The "real" API request response

Slide 28

Slide 28 text

Two “skills”, one codebase BFF on AWS Lambda The "real" API BFF on AWS Lambda >60% shared code with separate view layers different builds using Gulp

Slide 29

Slide 29 text

Generic Response Model

Slide 30

Slide 30 text

Challenges ▷ Text-to-Speech (TtS) is still king ○ Google didn't even add support for their native audio player until February 2018 ▷ No access to the user's location ▷ Error handling is interesting! ○ User might not even trigger your skill

Slide 31

Slide 31 text

Conclusions ▷ The code is not hard ▷ Understanding platform limitations and user expectations are

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Open source opportunities ▷ Would it be helpful to have a formalized framework?

Slide 34

Slide 34 text

Open source opportunities ▷ Would it be helpful to have a formalized framework? ○ Not really. The code is not hard. ▷ What we struggle with the most: QA ○ We need something like Selenium or Nightwatch.js for voice UI

Slide 35

Slide 35 text

Thank you! [email protected] @xiehan https://npr.codes