Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking and listening to web pages - WebTech Conference 2014

Talking and listening to web pages - WebTech Conference 2014

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.

Aurelio De Rosa

October 27, 2014

More Decks by Aurelio De Rosa

Other Decks in Programming



    WRITE FOR SitePoint Tuts+ ModernWeb Telerik php [architect] Web & PHP magazine
  2. WHAT WE'LL COVER Natural language processing (NLP) Why it matters

    The Web Speech API Speech recognition Speech synthesis Issues and inconsistencies Demo
  3. NATURAL LANGUAGE PROCESSING (NLP) A field of computer science, artificial

    intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.
  4. NATURAL LANGUAGE PROCESSING (NLP) It all started in 1950 when

    Alan Turing published an article titled “Computing Machinery and Intelligence” where he proposed what is now called the Turing test.
  5. VOICEXML It's an XML language for writing Web pages you

    interact with by listening to spoken prompts and other forms of audio that you can control by providing spoken inputs. Specifications: http://www.w3.org/TR/voicexml30/
  6. VOICEXML: EXAMPLE <?xml version="1.0" encoding="ISO-8859-1"?> <vxml version="3.0" lang="en"> <form> <field

    name="city"> <prompt>Where do you want to travel to?</prompt> <option>New York</option> <option>London</option> <option>Tokyo</option> </field> <block> <submit next="http://www.test.com" namelist="city"/> </block> </form> </vxml>
  7. JAVA APPLET It's an application written in Java and delivered

    to users in the form of bytecode through a web page. The applet is then executed within a Java Virtual Machine (JVM) in a process separated from the browser itself.
  8. WHY YOU SHOULD CARE A step ahead to fill the

    gap with native apps Improve user experience Feature needed by some applications such as navigators Help people with disabilities
  9. “DEMO IT OR IT DIDN'T HAPPEN”™ Register to our website

    Name: Surname: Nationality: Start This demo can be found at https://jsbin.com/faguji/watch?output
  10. WEB SPEECH API The Web Speech API allows you to

    deal with two aspects of the computer-human interaction: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html
  11. WEB SPEECH API Introduced at the end of 2012 Defines

    two interfaces: one for recognition and one for synthesis Requires the permission before acquiring audio Agnostic of the underlying technology
  12. SPEECH RECOGNITION There are two types of recognition available: one-shot

    and continuous. The first stops as soon as the user stops talking, the second must be stopped programmatically. To instantiate a new speech recognizer you have to call speechRecognition(): var recognizer = new speechRecognition();
  13. SPEECH RECOGNITION: BROWSERS SUPPORT Explorer Chrome Safari Firefox Opera None

    25+ (-webkit) None None None Data updated to 24th October 2014
  14. SPEECH RECOGNITION: PROPERTIES continuous grammars* interimResults lang maxAlternatives serviceURI** *Up

    to Chrome 38, adding a grammar to the grammars property does nothing. This happens because The group is currently discussing options for which grammar formats should be supported, how builtin grammar types are specified, and default grammars when not specified. **serviceURI isn't exposed by any browser.
  15. SPEECH RECOGNITION: EVENTS start end* audiostart audioend soundstart soundend speechstart

    speechend result nomatch error *Chrome 38 on Windows 8.1 doesn't fire the result or the error event before the end event when only noises are produced (issue ). #428873
  16. “IT'S SHOWTIME!” Start Stop This demo can be found at

  17. SPEECH RECOGNITION: RESULTS Results are obtained as an object (that

    implements the SpeechRecognitionEvent interface) passed as the first argument of the handler attached to the result event.
  18. PROBLEM: SOMETIMES RECOGNITION SUCKS! Imagine a user of your website

    or web app says a command but the recognizer returns the wrong string. Your system is good and it asks the user to repeat it, but the recognition fails again. How you can get out of this loop?
  19. LEVENSHTEIN DISTANCE: EXAMPLE Commands available: "Send email", "Call" Names in

    the phonebook: "Aurelio De Rosa", "Annarita Tranfici", "John Doe" Recognized text: Updated text: Start This demo can be found at https://jsbin.com/tevogu/watch?output
  20. SPEECH SYNTHESIS Provides text-to-speech functionality in the browser. This is

    especially useful for blind people and those with visual impairments in general. The feature is exposed via a speechSynthesis object that possess static methods.
  21. SPEECH SYNTHESIS: BROWSERS SUPPORT Explorer Chrome Safari Firefox Opera* None

    33+ 7+ None None Data updated to 24th October 2014 *In Opera the speechSynthesis object is exposed but does nothing! (issue DNA-28388)
  22. SPEECH SYNTHESIS: PROPERTIES pending speaking paused* *Up to Chrome 38,

    pausing the utterance doesn't reflect in a change of the pause property (issue ) #425553
  23. SPEECH SYNTHESIS: METHODS speak()* cancel() pause() resume() getVoices() *Up to

    Chrome 38 speak() doesn't support SSML and doesn't strip unrecognized tags (issue ). #428902
  24. SPEECH SYNTHESIS: UTTERANCE PROPERTIES lang pitch* rate text** voice volume

    *Up to Chrome 38, changing the pitch property does nothing (issue ) #376280 **Up to Chrome 38, the text property can't be set to an SSML (Speech Synthesis Markup Language) document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ). #428902
  25. SPEECH SYNTHESIS: UTTERANCE EVENTS start end pause resume boundary* mark*

    error boundary and mark are not supported by any browser because they are fired by the interaction with SSML documents.
  26. SHOW ME TEH CODEZ To set the text to emit,

    we can either pass it when instantiating an utterance object or set it later using the text property.
  27. EXAMPLE 1 var utterance = new SpeechSynthesisUtterance('Hello!'); utterance.lang = 'en-US';

    utterance.rate = 1.2; utterance.addEventListener('end', function() { console.log('Speech completed'); }); speechSynthesis.speak(utterance);
  28. EXAMPLE 2 var utterance = new SpeechSynthesisUtterance(); utterance.text = 'Hello!';

    utterance.lang = 'en-US'; utterance.rate = 1.2; utterance.addEventListener('end', function() { console.log('Speech completed'); }); speechSynthesis.speak(utterance);
  29. SPEECH SYNTHESIS: DEMO IT MAN! I know my voice isn't

    very sexy, but I still want to say that WebTech Conference 2014 is wonderful and the audience of my talk is even better. You all rock! This demo can be found at https://jsbin.com/cugavi/watch?output
  30. INTERACTIVE FORM: RECIPE Promises (to avoid the callback hell) Speech

    recognition Speech synthesis The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the screen.
  31. INTERACTIVE FORM: STEP 1 - HTML <form id="form"> <label for="name"

    data-question="What's your name?">Name:</label> <input id="name" /> <label for="surname" data-question="What's your surname?">Surname:</label> <input id="surname" /> <!-- Other label/element pairs here --> <input id="btn-voice" type="submit" value="Start" /> </form>

    object containing two methods: speak and recognize that return a Promise. var Speech = { speak: function(text) { return new Promise(function(resolve, reject) {...} }, recognize: function() { return new Promise(function(resolve, reject) {...} } }
  33. INTERACTIVE FORM: STEP 3 - JS 1/2 function formData(i) {

    return promise.then(function() { return Speech.speak( fieldLabels[i].dataset.question ); }).then(function() { return Speech.recognize().then(function(text) { document.getElementById( fieldLabels[i].getAttribute('for') ).value = text; }) }); }
  34. INTERACTIVE FORM: STEP 3 - JS 2/2 var form =

    document.getElementById('form'); form.addEventListener('click', function(event) { /* Code to retrieve all the labels of the form */ function formData(i) { /* code here */ } for(var i = 0; i < fieldLabels.length; i++) { promise = formData(i); } promise.then(function() { return Speech.speak('Thank you for filling...'); }).catch(function(error) { alert(error); }); });
  35. DICTATION: RECIPE Speech recognition The actual code is a bit

    different but I made the changes for the sake of brevity and the limited size of the screen.
  36. DICTATION: STEP 2 - JS 1/3 var recognizer = new

    SpeechRecognition(); recognizer.interimResults = true; recognizer.continuous = true; var transcr = document.getElementById('transcription'); var currTranscr = document.createElement('span'); currTranscr.id = 'current-transcription';
  37. DICTATION: STEP 2 - JS 2/3 recognizer.addEventListener('result', function(event){ currTranscr.textContent =

    ''; var i = event.resultIndex; while (i < event.results.length) { var result = event.results[i++]; if (result.isFinal) { transcr.removeChild(currTranscr); transcr.textContent += result[0].transcript; transcr.appendChild(currTranscr); } else { currTranscr.textContent += result[0].transcript; } } });
  38. DICTATION: STEP 2 - JS 3/3 var btnStart = document.getElementById('btn-start');

    btnStart.addEventListener('click', function() { transcr.textContent = ''; transcr.appendChild(currTranscr); recognizer.start(); }); var btnStop = document.getElementById('btn-stop'); btnStop.addEventListener('click', function() { recognizer.stop(); });
  39. ONE LAST DEMO... Video courtesy of Szymon Nowak ( ):

    @szimek https://www.youtube.com/v=R8ejjVAZweg