[JSInteractive] Finding Your Voice: Building Screenless Interfaces with Node.js

Slide 1

Slide 1 text

Finding Your Voice: Building Screenless Interfaces with Node.js Nara Kasbergen (@xiehan) | Node+JS Interactive | October 11, 2018

Slide 2

Slide 2 text

What is NPR? A quick explainer for the Canadians in the audience:

Slide 3

Slide 3 text

Who am I? ▷ Sr. full-stack web developer ▷ At NPR since March 2014 ▷ Part of a skunkworks team focused 100% on voice UI development ○ Formed in September 2017

Slide 4

Slide 4 text

Why voice UI development? Then: Now:

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

“smart speakers”

Slide 7

Slide 7 text

“smart speakers”

Slide 8

Slide 8 text

“voice assistants”

Slide 9

Slide 9 text

How Amazon views Alexa “Alexa, set the thermostat to 25 degrees.” “Okay.” “I'd like to reorder paper towels please.” “Alexa, thank you!” “No problem.”

Slide 10

Slide 10 text

“What would you want to know about voice UI development?”

Slide 11

Slide 11 text

“ What can I actually make?

Slide 12

Slide 12 text

“ Is it possible to build one app for Amazon Echo, Google Home, and Apple HomePod?

Slide 13

Slide 13 text

To understand the present, we must understand the past. 1. What can you actually make?

Slide 14

Slide 14 text

A brief timeline of voice assistants 2015 Amazon Alexa Skills Kit launches (June) 2014 2016 Google Assistant (May) + Google Home (November) 2017 Samsung Bixby (August) + Microsoft Cortana (October) 2018 Apple HomePod (February) Amazon Echo launches November 6

Slide 15

Slide 15 text

A natural evolution add voice activation to existing custom app ecosystem add content via RSS feeds add support for custom “skills” 1. 2. 3.

Slide 16

Slide 16 text

A natural evolution add voice activation to existing custom app ecosystem add content via RSS feeds add support for custom “skills” 1. 2. 3.

Slide 17

Slide 17 text

Conclusions ▷ Amazon has a 2-year lead ▷ Only Amazon and Google have fully developed ecosystems ▷ A big focus is adding access to news and podcasts via RSS ▷ Home automation is secondary

Slide 18

Slide 18 text

tl;dr yes … and no 2. Can you build one “skill” to rule them all?

Slide 19

Slide 19 text

Alexa + Google ecosystems ▷ Heavily leverage their existing cloud infrastructure ○ AWS Lambda + Google Cloud Functions ▷ Can also build a traditional REST API accessed by their services

Slide 20

Slide 20 text

The request/response flow Your code request response

Slide 21

Slide 21 text

The request/response flow Your code request response P.S. all the NLP and ML happens here

Slide 22

Slide 22 text

The future is “serverless” ▷ Others can speak more eloquently on this subject than me ○ Several other talks on serverless ▷ Let's just assume we want to use Lambda or Cloud Functions… ▷ … node.js wins!

Slide 23

Slide 23 text

The official SDKs are not bad Alexa node.js SDK: github.com/alexa/alexa-skills-kit-sdk-for-nodejs Actions on Google node.js SDK: github.com/actions-on-google/actions-on-google-nodejs

Slide 24

Slide 24 text

Examples from Alexa SDK responseBuilder.speak("Hello!"); responseBuilder.reprompt("Hello?"); responseBuilder.withSimpleCard( "Card Title", "Content!"); responseBuilder.addAudioPlayerPlay Directive(...url);

Slide 25

Slide 25 text

SSML: A common language WFUV is your station. There is a three second pause here then I continue. When I wake up, I speak slowly.

Slide 26

Slide 26 text

Backends-for-Frontends (BFFs) BFF on AWS Lambda The "real" API request response

Slide 27

Slide 27 text

Two “skills”, one codebase BFF on AWS Lambda The "real" API BFF on AWS Lambda >60% shared code with separate view layers different builds using Gulp

Slide 28

Slide 28 text

Generic Response Model

Slide 29

Slide 29 text

Challenges ▷ Text-to-Speech (TtS) is still king ○ Google didn't even add support for their native audio player until February 2018 ▷ No access to the user's location ▷ Error handling is interesting! ○ User might not even trigger your skill

Slide 30

Slide 30 text

Conclusions ▷ The code is not hard ▷ Understanding platform limitations and user expectations are

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Open source opportunities ▷ Would it be helpful to have a formalized framework?

Slide 33

Slide 33 text

Open source opportunities ▷ Would it be helpful to have a formalized framework? ○ Not really. The code is not hard. ▷ What we struggle with the most: QA ○ We need something like Selenium or Nightwatch.js for voice UI

Slide 34

Slide 34 text

P.S. Are you excited about QA for voice UI? … 'cause we're hiring! n.pr/tech-jobs

Slide 35

Slide 35 text

Resources from NPR ▷ NPR + Edison Research Smart Audio Report ▷ Talking Back To Your Radio: How We Approached Voice UI (npr.design) ▷ How To Prototype For Audio-Rich Voice Experiences Without Really Trying (npr.design) ▷ My talk on the ethics of voice UI

Slide 36

Slide 36 text

Resources from others ▷ Alexa Skill Blueprints (Amazon) ▷ Conversation design (Google) ▷ Designing Voice Experiences (Smashing Mag) ▷ Storyline: Create Alexa skills without coding ▷ Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri (Nieman Norman Group)

Slide 37

Slide 37 text

Thank you! [email protected] @xiehan https://npr.codes