Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking and listening to web pages - jsDay 2015

Talking and listening to web pages - jsDay 2015

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.

Aurelio De Rosa

May 14, 2015
Tweet

More Decks by Aurelio De Rosa

Other Decks in Programming

Transcript

  1. TALKING AND LISTENING
    TO WEB PAGES
    Aurelio De Rosa
    Verona, Italy ­ 14 May 2015

    View Slide

  2. WEB & APP DEVELOPER
    CONTRIBUTE(D) TO
    ...
    jQuery
    CanIUse
    PureCSS
    PUBLISH(ED) ON
    ...
    SitePoint
    Tuts+
    .NET megazine
    php [architect] megazine
    Telerik
    Web & PHP magazine

    View Slide

  3. AUTHORED BOOKS
    JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS
    (Shameless self­promotion!)

    View Slide

  4. View Slide

  5. WHAT WE'LL COVER
    Natural language processing (NLP)
    Why it matters
    The Web Speech API
    Speech recognition
    Speech synthesis
    Issues and inconsistencies
    Demo

    View Slide

  6. NATURAL LANGUAGE
    PROCESSING (NLP)
    A field of computer science, artificial intelligence, and linguistics
    concerned with the interactions between computers and human
    (natural) languages.

    View Slide

  7. NATURAL LANGUAGE PROCESSING (NLP)
    It all started in 1950 when Alan Turing published an article titled
    “Computing Machinery and Intelligence” where he proposed
    what is now called the Turing test.

    View Slide

  8. View Slide

  9. IT'S NOT ALL ABOUT
    TEXT

    View Slide

  10. ONCE UPON A TIME...

    View Slide

  11. VOICEXML
    It's an XML language for writing Web pages you interact with by
    listening to spoken prompts and other forms of audio that you
    can control by providing spoken inputs.
    Specifications: http://www.w3.org/TR/voicexml30/

    View Slide

  12. VOICEXML: EXAMPLE




    Where do you want to travel to?
    New York
    London
    Tokyo






    View Slide

  13. JAVA APPLET
    It's an application written in Java and delivered to users in the
    form of bytecode through a web page. The applet is then
    executed within a Java Virtual Machine (JVM) in a process
    separated from the browser itself.

    View Slide

  14. WHY I CARE

    View Slide

  15. WHY YOU SHOULD CARE
    A step ahead to fill the gap with native apps
    Improve user experience
    Feature needed by some applications such as navigators
    Help people with disabilities

    View Slide

  16. “DEMO IT OR IT DIDN'T HAPPEN”™
    Register to our website
    Name:
    Surname:
    Nationality:
    Start
    This demo can be found at https://jsbin.com/faguji/watch?output

    View Slide

  17. View Slide

  18. WEB SPEECH API
    The Web Speech API allows you to deal with two aspects of
    the computer­human interaction: Automatic Speech
    Recognition (ASR) and Text­to­Speech (TTS).
    Specifications: https://dvcs.w3.org/hg/speech­api/raw­file/tip/webspeechapi.html

    View Slide

  19. WEB SPEECH API
    Introduced at the end of 2012
    Defines two interfaces: one for recognition and one for
    synthesis
    Requires the permission before acquiring audio
    Agnostic of the underlying technology

    View Slide

  20. SPEECH RECOGNITION
    There are two types of recognition available: one­shot and
    continuous. The first stops as soon as the user stops talking,
    the second must be stopped programmatically.
    To instantiate a new speech recognizer you have to call
    SpeechRecognition():
    var recognizer = new SpeechRecognition();

    View Slide

  21. SPEECH RECOGNITION: BROWSERS SUPPORT
    Explorer Chrome Safari Firefox Opera
    None 25+ (­webkit) None None None
    Data updated to 8th May 2015

    View Slide

  22. SPEECH RECOGNITION: PROPERTIES
    continuous
    grammars*
    interimResults
    lang
    maxAlternatives
    serviceURI**
    *Up to Chrome 42 adding a grammar to the grammars property does nothing. This happens because The
    group is currently discussing options for which grammar formats should be supported, how builtin grammar
    types are specified, and default grammars when not specified.
    **serviceURI isn't exposed by any browser.

    View Slide

  23. SPEECH RECOGNITION: METHODS
    start()
    stop()
    abort()

    View Slide

  24. SPEECH RECOGNITION: EVENTS
    start
    end*
    audiostart
    audioend
    soundstart
    soundend
    speechstart
    speechend
    result
    nomatch
    error
    *Up to Chrome 42 on Windows 8.1 doesn't fire the result or the error event before the end event when
    only noises are produced (issue ).
    #428873

    View Slide

  25. “IT'S SHOWTIME!”
    Start Stop
    This demo can be found at https://jsbin.com/zesew/watch?output

    View Slide

  26. SPEECH RECOGNITION: RESULTS
    Results are obtained as an object (that implements the
    SpeechRecognitionEvent interface) passed as the first
    argument of the handler attached to the result event.

    View Slide

  27. PROBLEM: SOMETIMES RECOGNITION SUCKS!
    Imagine a user of your website or web app says a command
    but the recognizer returns the wrong string. Your system is
    good and it asks the user to repeat it, but the recognition fails
    again.
    How can you get out of this loop?

    View Slide

  28. (IDEAL) SOLUTION: GRAMMARS

    View Slide

  29. SOLUTION: LEVENSHTEIN DISTANCE
    An approach that isn't ideal but that you can use today.

    View Slide

  30. LEVENSHTEIN DISTANCE: EXAMPLE
    Commands available: "Send email", "Call"
    Names in the phonebook: "Aurelio De Rosa", "Annarita
    Tranfici", "John Doe"
    Recognized text:
    Updated text:
    Start
    This demo can be found at https://jsbin.com/tevogu/watch?output

    View Slide

  31. View Slide

  32. SPEECH SYNTHESIS
    Provides text­to­speech functionality in the browser. This is
    especially useful for blind people and those with visual
    impairments in general.
    The feature is exposed via a speechSynthesis object that
    possess static methods.

    View Slide

  33. SPEECH SYNTHESIS: BROWSERS SUPPORT
    Explorer Chrome Safari Firefox Opera
    None 33+ 7+ None 27+
    Data updated to 8th May 2015

    View Slide

  34. SPEECH SYNTHESIS: PROPERTIES
    pending
    speaking
    paused* **
    *Up to Chrome 42, pausing the utterance doesn't reflect in a change of the pause property (issue )
    #425553
    **In Opera 27 pausing the utterance reflect in an erroneous, reversed change of the pause property (issue
    #DNA­37487)

    View Slide

  35. SPEECH SYNTHESIS: METHODS
    speak()*
    cancel()
    pause()
    resume()
    getVoices()
    *Up to Chrome 42, speak() doesn't support SSML and doesn't strip unrecognized tags (issue ).
    #428902

    View Slide

  36. SPEECH SYNTHESIS: VOICES

    View Slide

  37. SPEECH SYNTHESIS: EVENTS
    voicechanged

    View Slide

  38. SPEECH SYNTHESIS: UTTERANCE INTERFACE
    The SpeechSynthesisUtterance interface represents the
    utterance (i.e. the text) that will be spoken by the synthesizer.

    View Slide

  39. SPEECH SYNTHESIS: UTTERANCE PROPERTIES
    lang
    pitch*
    rate*
    text**
    voice
    volume*
    *Up to Chrome 42, changing the pitch, the volume, and the rate properties does nothing (issue )
    #376280
    **Up to Chrome 42, the text property can't be set to an SSML (Speech Synthesis Markup Language)
    document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ).
    #428902

    View Slide

  40. SPEECH SYNTHESIS: UTTERANCE EVENTS
    start
    end
    pause
    resume
    boundary*
    mark*
    error
    boundary and mark are not supported by any browser because they are fired by the interaction with SSML
    documents.

    View Slide

  41. SHOW ME TEH CODEZ
    To set the text to emit, we can either pass it when instantiating
    an utterance object or set it later using the text property.

    View Slide

  42. EXAMPLE 1
    var utterance = new SpeechSynthesisUtterance('Hello!');
    utterance.lang = 'en‐US';
    utterance.rate = 1.2;
    utterance.addEventListener('end', function() {
    console.log('Speech completed');
    });
    speechSynthesis.speak(utterance);

    View Slide

  43. EXAMPLE 2
    var utterance = new SpeechSynthesisUtterance();
    utterance.text = 'Hello!';
    utterance.lang = 'en‐US';
    utterance.rate = 1.2;
    utterance.addEventListener('end', function() {
    console.log('Speech completed');
    });
    speechSynthesis.speak(utterance);

    View Slide

  44. SPEECH SYNTHESIS: DEMO IT MAN!
    I know my voice isn't very sexy, but I still want to say that this
    conference is wonderful and the audience of my talk is even
    better. You all rock!
    This demo can be found at https://jsbin.com/cipepa/watch?output

    View Slide

  45. HOW I DID IT

    View Slide

  46. INTERACTIVE FORM: RECIPE
    Promises (to avoid the callback hell)
    Speech recognition
    Speech synthesis
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View Slide

  47. INTERACTIVE FORM: STEP 1 - HTML

    data‐question="What's your name?">Name:

    data‐question="What's your surname?">Surname:




    View Slide

  48. INTERACTIVE FORM: STEP 2 - SUPPORT
    LIBRARY
    Create a Speech object containing two methods: speak and
    recognize that return a Promise.
    var Speech = {
    speak: function(text) {
    return new Promise(function(resolve, reject) {...}
    },
    recognize: function() {
    return new Promise(function(resolve, reject) {...}
    }
    }

    View Slide

  49. INTERACTIVE FORM: STEP 3 - JS 1/2
    function formData(promise, label) {
    return promise.then(function() {
    return Speech.speak( label.dataset.question );
    }).then(function() {
    return Speech.recognize().then(function(text) {
    document.getElementById(
    label.getAttribute('for')
    ).value = text;
    })
    });
    }

    View Slide

  50. INTERACTIVE FORM: STEP 3 - JS 2/2
    button.addEventListener('click', function(event) {
    var fieldLabels = document.querySelectorAll('label');
    var promise = new Promise(function(resolve) { resolve(); });
    function formData(label) { /* code here */ }
    for(var i = 0; i < fieldLabels.length; i++) {
    promise = formData(promise, fieldLabels[i]);
    }
    promise.then(function() {
    return Speech.speak('Thank you for filling...');
    }).catch(function(error) {
    alert(error);
    });
    });

    View Slide

  51. DICTATION: RECIPE
    Speech recognition
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View Slide

  52. DICTATION: STEP 1 - HTML

    Start
    Stop

    View Slide

  53. DICTATION: STEP 2 - JS 1/3
    var recognizer = new SpeechRecognition();
    recognizer.interimResults = true;
    recognizer.continuous = true;
    var transcr = document.getElementById('transcription');
    var currTranscr = document.createElement('span');
    currTranscr.id = 'current‐transcription';

    View Slide

  54. DICTATION: STEP 2 - JS 2/3
    recognizer.addEventListener('result', function(event){
    currTranscr.textContent = '';
    var i = event.resultIndex;
    while (i < event.results.length) {
    var result = event.results[i++];
    if (result.isFinal) {
    transcr.removeChild(currTranscr);
    transcr.textContent += result[0].transcript;
    transcr.appendChild(currTranscr);
    } else {
    currTranscr.textContent += result[0].transcript;
    }
    }
    });

    View Slide

  55. DICTATION: STEP 2 - JS 3/3
    var btnStart = document.getElementById('btn‐start');
    btnStart.addEventListener('click', function() {
    transcr.textContent = '';
    transcr.appendChild(currTranscr);
    recognizer.start();
    });
    var btnStop = document.getElementById('btn‐stop');
    btnStop.addEventListener('click', function() {
    recognizer.stop();
    });

    View Slide

  56. ONE LAST DEMO...
    Video courtesy of Szymon Nowak ( ):
    @szimek https://www.youtube.com/watch?v=R8ejjVAZweg

    View Slide

  57. THANK YOU!

    View Slide

  58. QUESTIONS?

    View Slide

  59. CONTACTS
    Website:
    Email:
    Twitter:
    www.audero.it
    [email protected]
    @AurelioDeRosa

    View Slide