In conversation with a browser - at SydJS

8ec1383b240b5ba15ffb9743fceb3c0e?s=47 Phil Nash
September 18, 2019

In conversation with a browser - at SydJS

Voice assistants have taken off, but can we build our own with web technologies? I've been building bots for other platforms, but I wanted to investigate how well one could work in the browser. Can we talk to a web application and get results?

Let's dive into the Web Speech API, speech synthesis, and conversation design. We'll find out whether browsers can be virtual assistants or virtually useless.

--

Links:

http://github.com/philnash/web-assistant/

Browser voices: https://glitch.com/~browser-voices

An introduction to the MediaRecorder API: https://www.twilio.com/blog/mediastream-recording-api
Web recorder: https://glitch.com/~web-recorder

Speech to text with Watson in the browser: https://watson-speech.mybluemix.net/microphone-streaming.html
https://github.com/watson-developer-cloud/speech-javascript-sdk
https://github.com/anteloe/speech-polyfill
https://github.com/compulim/web-speech-cognitive-services

Tensorflow speech model: https://storage.googleapis.com/tfjs-speech-model-test/2019-01-03a/dist/index.html
ml5js: https://ml5js.org/

8ec1383b240b5ba15ffb9743fceb3c0e?s=128

Phil Nash

September 18, 2019
Tweet

Transcript

  1. IN CONVERSATION WITH A BROWSER @philnash

  2. Phil Nash @philnash http://philna.sh philnash@twilio.com

  3. BOTS @philnash

  4. ELIZA: Please tell me what's been bothering you. YOU: I'm

    scared of public speaking. ELIZA: Do you know anyone else who is scared of public speaking ? YOU: Lot's of people ELIZA: Do you feel strongly about discussing such things ? YOU: Only with you ELIZA: You're not really talking about me -- are you ? YOU: No, I'm the one who is scared. ELIZA: Why 'no' ? @philnash
  5. @philnash

  6. *´¯`*.¸¸.*´¯`*ÃŇĞẸĹŜĎẸÃĎƑÃĹĹ*´¯`*.¸¸.*´¯`* @philnash

  7. IRC > SMS > SLACK @philnash

  8. None
  9. HOW DO WE BUILD OUR OWN? @philnash

  10. WEB APIS @philnash

  11. WEB SPEECH @philnash

  12. Speech Synthesis Hello everyone const utterance = new SpeechSynthesisUtterance(text); speechSynthesis.speak(utterance);

    01. 02. @philnash
  13. Speech Synthesis https://glitch.com/~browser-voices @philnash

  14. @philnash

  15. Speech Recognition Start Speech Recognition start listening to me and

    probably putting things into that textarea there eventually I can get it going again I can see he @philnash
  16. Speech Recognition const recognition = new webkitSpeechRecognition(); recognition.addEventListener('result', event =>

    { const result = event.results[0][0].transcript; console.log(result); }); recognition.start(); 01. 02. 03. 04. 05. 06. @philnash
  17. @philnash

  18. Speech Recognition Sends all the data to Google Cloud Speech

    API @philnash
  19. MEDIARECORDER API @philnash

  20. MediaRecorder API Start recording 0:00 0:00 @philnash

  21. MediaRecorder API const stream = await navigator.mediaDevices.getUserMedia(); const recorder =

    new MediaRecorder(stream, { type: 'audio/webm' }); const chunks = []; 01. 02. 03. @philnash
  22. MediaRecorder API recorder.addEventListener('dataavailable', event => { if (typeof event.data ===

    'undefined') return; if (event.data.size === 0) return; chunks.push(event.data); }); 01. 02. 03. 04. 05. @philnash
  23. MediaRecorder API recorder.addEventListener('stop', event => { const recording = new

    Blob(chunks, { type: 'audio/webm' }); }); 01. 02. 03. 04. 05. @philnash
  24. @philnash

  25. MediaRecorder API https://glitch.com/~web-recorder @philnash

  26. Then what? Send the file to a speech to text

    service • Google Cloud Speech • Azure Cognitive Services • IBM Watson @philnash
  27. WEBAUDIO API @philnash

  28. @philnash

  29. AUDIOWORKLET + WEBSOCKETS @philnash

  30. DEMO @philnash

  31. Web Speech Alternatives/Polyfills https://github.com/watson-developer-cloud/speech-javascript-sdk https://github.com/anteloe/speech-polyfill https://github.com/compulim/web-speech-cognitive-services @philnash

  32. THIS IS ALL GREAT... BUT @philnash

  33. IT'S SENDING ALL THE MIC DATA TO A THIRD PARTY

    SERVICE @philnash
  34. WAKE WORDS @philnash

  35. MACHINE LEARNING @philnash

  36. Machine Learning TensorFlow.js ml5.js @philnash

  37. DEMO @philnash

  38. CONVERSATION DESIGN @philnash

  39. SPEAK YOUR BOT CONVERSATIONS OUT LOUD WITH SOMEONE ELSE @philnash

  40. WHAT DO WE DO WITH THIS? @philnash

  41. TECHNICAL JOURNEY @philnash

  42. WEB PLATFORM @philnash

  43. EXPERIMENTATION + FREEDOM @philnash

  44. None
  45. @philnash

  46. WEB ASSISTANT @philnash

  47. Web Assistant https://github.com/philnash/web-assistant/ @philnash

  48. THIS IS JUST THE START OF THE JOURNEY @philnash

  49. @philnash

  50. Thanks! @philnash http://philna.sh philnash@twilio.com