Upgrade to Pro — share decks privately, control downloads, hide ads and more …

In conversation with a browser - at SydJS

Phil Nash
September 18, 2019

In conversation with a browser - at SydJS

Voice assistants have taken off, but can we build our own with web technologies? I've been building bots for other platforms, but I wanted to investigate how well one could work in the browser. Can we talk to a web application and get results?

Let's dive into the Web Speech API, speech synthesis, and conversation design. We'll find out whether browsers can be virtual assistants or virtually useless.

--

Links:

http://github.com/philnash/web-assistant/

Browser voices: https://glitch.com/~browser-voices

An introduction to the MediaRecorder API: https://www.twilio.com/blog/mediastream-recording-api
Web recorder: https://glitch.com/~web-recorder

Speech to text with Watson in the browser: https://watson-speech.mybluemix.net/microphone-streaming.html
https://github.com/watson-developer-cloud/speech-javascript-sdk
https://github.com/anteloe/speech-polyfill
https://github.com/compulim/web-speech-cognitive-services

Tensorflow speech model: https://storage.googleapis.com/tfjs-speech-model-test/2019-01-03a/dist/index.html
ml5js: https://ml5js.org/

Phil Nash

September 18, 2019
Tweet

More Decks by Phil Nash

Other Decks in Programming

Transcript

  1. ELIZA: Please tell me what's been bothering you. YOU: I'm

    scared of public speaking. ELIZA: Do you know anyone else who is scared of public speaking ? YOU: Lot's of people ELIZA: Do you feel strongly about discussing such things ? YOU: Only with you ELIZA: You're not really talking about me -- are you ? YOU: No, I'm the one who is scared. ELIZA: Why 'no' ? @philnash
  2. Speech Recognition Start Speech Recognition start listening to me and

    probably putting things into that textarea there eventually I can get it going again I can see he @philnash
  3. Speech Recognition const recognition = new webkitSpeechRecognition(); recognition.addEventListener('result', event =>

    { const result = event.results[0][0].transcript; console.log(result); }); recognition.start(); 01. 02. 03. 04. 05. 06. @philnash
  4. MediaRecorder API const stream = await navigator.mediaDevices.getUserMedia(); const recorder =

    new MediaRecorder(stream, { type: 'audio/webm' }); const chunks = []; 01. 02. 03. @philnash
  5. MediaRecorder API recorder.addEventListener('dataavailable', event => { if (typeof event.data ===

    'undefined') return; if (event.data.size === 0) return; chunks.push(event.data); }); 01. 02. 03. 04. 05. @philnash
  6. MediaRecorder API recorder.addEventListener('stop', event => { const recording = new

    Blob(chunks, { type: 'audio/webm' }); }); 01. 02. 03. 04. 05. @philnash
  7. Then what? Send the file to a speech to text

    service • Google Cloud Speech • Azure Cognitive Services • IBM Watson @philnash