Pro Yearly is on sale from $80 to $50! »

In conversation with a browser

In conversation with a browser

Voice assistants have taken off, but can we build our own with web technologies? I've been building bots for other platforms, but I wanted to investigate how well one could work in the browser. Can we talk to a web application and get results?

Let's dive into the Web Speech API, speech synthesis, and conversation design. We'll find out whether browsers can be virtual assistants or virtually useless.

--

Links:

http://github.com/philnash/web-assistant/

Browser voices: https://glitch.com/~browser-voices

An introduction to the MediaRecorder API: https://www.twilio.com/blog/mediastream-recording-api
Web recorder: https://glitch.com/~web-recorder

Speech to text with Watson in the browser: https://watson-speech.mybluemix.net/microphone-streaming.html
https://github.com/watson-developer-cloud/speech-javascript-sdk
https://github.com/anteloe/speech-polyfill
https://github.com/compulim/web-speech-cognitive-services

Tensorflow speech model: https://storage.googleapis.com/tfjs-speech-model-test/2019-01-03a/dist/index.html
ml5js: https://ml5js.org/

8ec1383b240b5ba15ffb9743fceb3c0e?s=128

Phil Nash

July 10, 2019
Tweet

Transcript

  1. IN CONVERSATION WITH A BROWSER @philnash

  2. Phil Nash @philnash http://philna.sh philnash@twilio.com

  3. BOTS @philnash

  4. ELIZA: How do you do. Please tell me your problem.

    YOU: I've had too much beer and pizza ELIZA: You say you've had too much beer and pizza ? YOU: yes ELIZA: I understand. YOU: It's a problem ELIZA: What does that suggest to you ? @philnash
  5. @philnash

  6. *´¯`*.¸¸.*´¯`*ÃŇĞẸĹŜĎẸÃĎƑÃĹĹ*´¯`*.¸¸.*´¯`* @philnash

  7. IRC > SMS > SLACK @philnash

  8. None
  9. HOW DO WE BUILD OUR OWN? @philnash

  10. WEB APIS @philnash

  11. WEB SPEECH @philnash

  12. Speech Synthesis const utterance = new SpeechSynthesisUtterance(text); speechSynthesis.speak(utterance); 01. 02.

    @philnash
  13. Speech Synthesis https://glitch.com/~browser-voices @philnash

  14. @philnash

  15. Speech Recognition Start Speech Recognition leaving Mel JS how was

    all the pizza is really useful at because it just works in again it's reefs @philnash
  16. Speech Recognition const recognition = new webkitSpeechRecognition(); recognition.addEventListener('result', event =>

    { const result = event.results[0][0].transcript; console.log(result); }); recognition.start(); 01. 02. 03. 04. 05. 06. @philnash
  17. @philnash

  18. Speech Recognition Sends all the data to Google Cloud Speech

    API @philnash
  19. MEDIARECORDER API @philnash

  20. MediaRecorder API Start recording 0:02 0:02 / 0:02 / 0:02

    @philnash
  21. MediaRecorder API const stream = await navigator.mediaDevices.getUserMedia(); const recorder =

    new MediaRecorder(stream, { type: 'audio/webm' }); const chunks = []; 01. 02. 03. @philnash
  22. MediaRecorder API recorder.addEventListener('dataavailable', event => { if (typeof event.data ===

    'undefined') return; if (event.data.size === 0) return; chunks.push(event.data); }); 01. 02. 03. 04. 05. @philnash
  23. MediaRecorder API recorder.addEventListener('stop', event => { const recording = new

    Blob(chunks, { type: 'audio/webm' }); }); 01. 02. 03. 04. 05. @philnash
  24. @philnash

  25. MediaRecorder API https://glitch.com/~web-recorder @philnash

  26. Then what? Send the file to a speech to text

    service • Google Cloud Speech • Azure Cognitive Services • IBM Watson @philnash
  27. WEBAUDIO API @philnash

  28. @philnash

  29. AUDIOWORKLET + WEBSOCKETS @philnash

  30. DEMO @philnash

  31. Web Speech Alternatives/Polyfills https://github.com/watson-developer-cloud/speech-javascript-sdk https://github.com/anteloe/speech-polyfill https://github.com/compulim/web-speech-cognitive-services @philnash

  32. THIS IS ALL GREAT... BUT @philnash

  33. IT'S SENDING ALL THE MIC DATA TO A THIRD PARTY

    SERVICE @philnash
  34. WAKE WORDS @philnash

  35. MACHINE LEARNING @philnash

  36. Machine Learning TensorFlow.js ml5.js @philnash

  37. DEMO @philnash

  38. CONVERSATION DESIGN @philnash

  39. SPEAK YOUR BOT CONVERSATIONS OUT LOUD WITH SOMEONE ELSE @philnash

  40. WHAT DO WE DO WITH THIS? @philnash

  41. TECHNICAL JOURNEY @philnash

  42. WEB PLATFORM @philnash

  43. EXPERIMENTATION + FREEDOM @philnash

  44. None
  45. @philnash

  46. WEB ASSISTANT @philnash

  47. Web Assistant https://github.com/philnash/web-assistant/ @philnash

  48. THIS IS JUST THE START OF THE JOURNEY @philnash

  49. @philnash

  50. Thanks! @philnash http://philna.sh philnash@twilio.com