Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[FEF] Voice: The New Frontend

[FEF] Voice: The New Frontend


Nara Kasbergen

April 03, 2019


  1. Nara Kasbergen (@xiehan) #FrontEndFest Wednesday, April 3, 2019

  2. None
  3. None
  4. Now: Then:

  5. None
  6. Vision of the future How we got here Where we

    are today How to build for voice today Lessons learned AMA (Ask Me Anything)
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. definition: the ability of a computer program to understand human

    language as it is spoken. NLP is a component of artificial intelligence (AI). (source) circa 1960 first major advances in the 1980s
  15. definition: Machine learning is an application of artificial intelligence (AI)

    that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves. (source) circa 1950s first major advances in the 1980s
  16. https://hackernoon.com/moores-law-is-alive-and-well-adc010ea7a63

  17. None
  18. None
  19. None
  20. None
  21. None
  22. definition: a computer program or an artificial intelligence which conducts

    a conversation via auditory or textual methods. Such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby passing the Turing test. (source) circa 1966 (ELIZA) first major advances in the early 2000s
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. Web Speech API: • Speech recognition • Speech synthesis (TTS)

    W3C Community specification was published in 2012 SpeechRecognition interface currently only supported in Chrome, experimental feature Uses Google's servers to convert speech to text (requires Internet connection)
  32. "Common Voice is Mozilla's initiative to help teach machines how

    real people speak." (source) Publicly open dataset Upload recordings of your voice Help reduce bias in Natural Language Processing (NLP) & machine learning
  33. Mozilla Scout Mycroft

  34. NLP/ML/AI Consumer applications

  35. None
  36. None
  37. it's not about the code

  38. • node.js • serverless (AWS Lambda or Google Cloud Functions)

    • lightweight database (e.g. Amazon DynamoDB) • CI server of choice (e.g. Travis, Jenkins, etc.) • unit testing framework of choice (e.g. Jest) • TypeScript…?
  39. Alexa node.js SDK: github.com/alexa/alexa-skills-kit-sdk-for-nodejs Actions on Google node.js SDK: github.com/actions-on-google/actions-on-google-nodejs

  40. Alexa Google / Dialogflow

  41. None
  42. invocation name

  43. None
  44. None
  45. intent sample utterance sample utterance

  46. None
  47. intent slot (or entity)

  48. JSON event with intent name and slot(s) Handler function mapped

    to that intent Use SDK to produce a response with speech, audio, etc.
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. • Launch requests • Add intents • Wait for response

    • Handle slots • Add images to cards • Play simple audio • Use SSML • User login (OAuth2) • Persistent data • State management • Contexts (Google) • Dialog management (Alexa) • Advanced audio (Alexa) • Customize display on visual devices • Monetization / transactions • Request user permissions • Notifications • Internationalization
  56. • By default, skills in development are only available to

    you • Certification process similar to mobile app store submissions ◦ ~48-hour turnaround on average ◦ Feedback is unpredictable! ◦ Respect existing brands • Can share a "beta" version with co-workers, friends, etc. ◦ Great for QA as well as hobby projects
  57. Alexa: • Alexa Skill Blueprints Google: • Actions on Google

  58. None
  59. • The code is not hard • What is hard:

    ◦ Learning about the platform limitations ◦ Managing stakeholder expectations ◦ Understanding & changing user behaviors ◦ QA • It's not just an engineering challenge!
  60. • You would think these platforms are ideal for audio…

    but they're not! • It's clear the companies designing these platforms are still focused primarily on Text-to-Speech (TTS) • The Actions on Google audio player is almost unusable • The Alexa audio player has many features but is very unintuitive when you're first working with it
  61. • Invisible errors: ◦ The Alexa service / Google /

    Dialogflow can reject a user's request ◦ If that happens, your app is not notified at all! ◦ Logs/analytics can't tell the whole story ◦ Users often don't understand why it failed • Real user testing is critical!
  62. None
  63. • Resilience • Patience • Honesty • Openness • Empathy

    ◦ Front-end developers have an edge!
  64. None
  65. Thank you! Keep in touch: @xiehan nara@nara.codes