Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking and listening to web pages - Topconf Tallinn 2015

Talking and listening to web pages - Topconf Tallinn 2015

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.

Aurelio De Rosa

November 18, 2015
Tweet

More Decks by Aurelio De Rosa

Other Decks in Programming

Transcript

  1. TALKING AND LISTENING TO WEB PAGES
    Aurelio De Rosa
    Tallinn, Estonia - 18 November 2015

    View full-size slide

  2. WEB DEVELOPER
    CONTRIBUTE(D) TO
    ...
    jQuery
    CanIUse
    GitHub.js
    PureCSS
    WRITE(D) FOR
    ...
    SitePoint
    Tuts+
    .NET magazine
    php [architect]
    Telerik

    View full-size slide

  3. AUTHORED BOOKS
    JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS
    (Shameless self-promotion!)

    View full-size slide

  4. WHAT WE'LL COVER
    Natural language processing (NLP)
    Why it matters
    The Web Speech API
    Speech recognition
    Speech synthesis
    Issues and inconsistencies
    Demo

    View full-size slide

  5. NATURAL LANGUAGE
    PROCESSING (NLP)
    A field of computer science, artificial intelligence, and linguistics
    concerned with the interactions between computers and human
    (natural) languages.

    View full-size slide

  6. NATURAL LANGUAGE PROCESSING (NLP)
    It all started in 1950 when Alan Turing published an article titled
    “Computing Machinery and Intelligence” where he proposed
    what is now called the Turing test.

    View full-size slide

  7. IT'S NOT ALL ABOUT TEXT

    View full-size slide

  8. ONCE UPON A TIME...

    View full-size slide

  9. VOICEXML
    It's an XML language for writing Web pages you interact with by
    listening to spoken prompts and other forms of audio that you
    can control by providing spoken inputs.
    Specifications: http://www.w3.org/TR/voicexml30/

    View full-size slide

  10. VOICEXML: EXAMPLE
    "[POYHUVLRQ HQFRGLQJ ,62"!
    Y[POYHUVLRQ ODQJ HQ!
    IRUP!
    ILHOGQDPH FLW\!
    SURPSW!:KHUHGR\RXZDQWWRWUDYHOWR"SURPSW!
    RSWLRQ!1HZRSWLRQ!/RQGRQRSWLRQ!
    RSWLRQ!7RN\RRSWLRQ!
    ILHOG!
    EORFN!
    VXEPLWQH[W KWWSZZZWHVWFRPQDPHOLVW FLW\!
    EORFN!
    IRUP!
    Y[PO!

    View full-size slide

  11. JAVA APPLET
    It's an application written in Java and delivered to users in the
    form of bytecode through a web page. The applet is then
    executed within a Java Virtual Machine (JVM) in a process
    separated from the browser itself.

    View full-size slide

  12. WHY YOU SHOULD CARE
    A step ahead to fill the gap with native apps
    Improve user experience
    Feature needed by some applications such as navigators
    Help people with disabilities

    View full-size slide

  13. “DEMO IT OR IT DIDN'T HAPPEN”™
    Register to our website
    Name:
    Surname:
    Nationality:
    6WDUW
    This demo can be found at https://jsbin.com/faguji/watch?output

    View full-size slide

  14. WEB SPEECH API
    The Web Speech API allows you to deal with two aspects of the
    computer-human interaction: Automatic Speech Recognition
    (ASR) and Text-to-Speech (TTS).
    Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

    View full-size slide

  15. WEB SPEECH API
    Introduced at the end of 2012
    Defines two interfaces: one for recognition and one for
    synthesis
    Requires the permission before acquiring audio
    Agnostic of the underlying technology

    View full-size slide

  16. SPEECH RECOGNITION
    There are two types of recognition available: one-shot and
    continuous. The first stops as soon as the user stops talking, the
    second must be stopped programmatically.
    To instantiate a new speech recognizer you have to call
    ’‡‡…Š‡…‘‰‹–‹‘ſƀ:
    YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ

    View full-size slide

  17. SPEECH RECOGNITION: BROWSERS SUPPORT
    IE/Edge Chrome Safari Firefox Opera
    None 25+ (-webkit) None None None
    Data updated to 17th November 2015

    View full-size slide

  18. SPEECH RECOGNITION: PROPERTIES
    …‘–‹—‘—•
    ‰”ƒƒ”•*
    ‹–‡”‹‡•—Ž–•
    Žƒ‰
    ƒšŽ–‡”ƒ–‹˜‡•
    •‡”˜‹…‡
    *Up to chrome 46 adding a grammar to the ‰”ƒƒ”• property does nothing. This happens because The group
    is currently discussing options for which grammar formats should be supported, how builtin grammar types are
    specified, and default grammars when not specified.

    View full-size slide

  19. SPEECH RECOGNITION: METHODS
    •–ƒ”–ſƀ
    •–‘’ſƀ
    ƒ„‘”–ſƀ

    View full-size slide

  20. SPEECH RECOGNITION: EVENTS
    •–ƒ”–
    ‡†Ƌ
    ƒ—†‹‘•–ƒ”–
    ƒ—†‹‘‡†
    •‘—†•–ƒ”–
    •‘—†‡†
    •’‡‡…Š•–ƒ”–
    •’‡‡…Š‡†
    ”‡•—Ž–
    ‘ƒ–…Š
    ‡””‘”
    *Up to chrome 46 on Windows doesn't fire the ”‡•—Ž– or the ‡””‘” event before the ‡† event when only
    noises are produced (issue ).
    #428873

    View full-size slide

  21. “IT'S SHOWTIME!”
    6WDUW 6WRS
    This demo can be found at https://jsbin.com/zesew/watch?output

    View full-size slide

  22. SPEECH RECOGNITION: RESULTS
    Results are obtained as an object (that implements the
    ’‡‡…Š‡…‘‰‹–‹‘˜‡– interface) passed as the first
    argument of the handler attached to the ”‡•—Ž– event.

    View full-size slide

  23. PROBLEM: SOMETIMES RECOGNITION SUCKS!
    Imagine a user of your website or web app says a command but
    the recognizer returns the wrong string. Your system is good and
    it asks the user to repeat it, but the recognition fails again.
    How can you get out of this loop?

    View full-size slide

  24. (IDEAL) SOLUTION: GRAMMARS

    View full-size slide

  25. SOLUTION: LEVENSHTEIN DISTANCE
    An approach that isn't ideal but that you can use today.

    View full-size slide

  26. LEVENSHTEIN DISTANCE: EXAMPLE
    Commands available: "Send email", "Call"
    Names in the phonebook: "Aurelio De Rosa", "Annarita
    Tranfici", "John Doe"
    Recognized text:
    Updated text:
    6WDUW
    This demo can be found at https://jsbin.com/tevogu/watch?output

    View full-size slide

  27. SPEECH SYNTHESIS
    Provides text-to-speech functionality in the browser. This is
    especially useful for blind people and those with visual
    impairments in general.
    The feature is exposed via a •’‡‡…Š›–Š‡•‹• object that
    possess static methods.

    View full-size slide

  28. SPEECH SYNTHESIS: BROWSERS SUPPORT
    IE/Edge Chrome Safari Firefox Opera
    None 33+ 7+ 31+ (behind a
    flag)
    27+
    Data updated to 17th November 2015

    View full-size slide

  29. SPEECH SYNTHESIS: PROPERTIES
    ’‡†‹‰
    •’‡ƒ‹‰
    ’ƒ—•‡†* **
    *Up to chrome 46, pausing the utterance doesn't reflect in a change of the ’ƒ—•‡ property (issue )
    #425553
    **Up to Opera 33 when calling the ’ƒ—•‡ſƀ/”‡•—‡ſƀ methods there is a delay in updating the value of the
    ’ƒ—•‡ property which might result in unexpected behaviors (issue #DNA-45911)

    View full-size slide

  30. SPEECH SYNTHESIS: METHODS
    •’‡ƒſƀ*
    …ƒ…‡Žſƀ
    ’ƒ—•‡ſƀ
    ”‡•—‡ſƀ
    ‰‡–‘‹…‡•ſƀ
    *Up to chrome 46, •’‡ƒſƀ doesn't support SSML and doesn't strip unrecognized tags (issue ).
    #428902

    View full-size slide

  31. SPEECH SYNTHESIS: VOICES

    View full-size slide

  32. SPEECH SYNTHESIS: EVENTS
    ˜‘‹…‡•…Šƒ‰‡†

    View full-size slide

  33. SPEECH SYNTHESIS: UTTERANCE INTERFACE
    The ’‡‡…Š›–Š‡•‹•––‡”ƒ…‡ interface represents the
    utterance (i.e. the text) that will be spoken by the synthesizer.

    View full-size slide

  34. SPEECH SYNTHESIS: UTTERANCE PROPERTIES
    Žƒ‰
    ’‹–…Š
    ”ƒ–‡*
    –‡š–**
    ˜‘‹…‡
    ˜‘Ž—‡
    *Up to chrome 46, the ”ƒ–‡ property doesn't work correctly (issue )
    #376280
    **Up to chrome 46, the –‡š– property can't be set to an SSML (Speech Synthesis Markup Language)
    document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ).
    #428902

    View full-size slide

  35. SPEECH SYNTHESIS: UTTERANCE EVENTS
    •–ƒ”–
    ‡†
    ’ƒ—•‡
    ”‡•—‡
    „‘—†ƒ”›*
    ƒ”*
    ‡””‘”
    „‘—†ƒ”› and ƒ” are not supported by any browser because they are fired by the interaction with SSML
    documents.

    View full-size slide

  36. SHOW ME TEH CODEZ
    To set the text to emit, we can either pass it when instantiating an
    utterance object or set it later using the –‡š– property.

    View full-size slide

  37. EXAMPLE 1
    YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH
    +HOOR

    XWWHUDQFHODQJ
    HQ86

    XWWHUDQFHUDWH
    XWWHUDQFHDGG(YHQW/LVWHQHU
    HQG
    IXQFWLRQ^
    FRQVROHORJ
    6SHHFKFRPSOHWHG

    `
    VSHHFK6\QWKHVLVVSHDNXWWHUDQFH

    View full-size slide

  38. EXAMPLE 2
    YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH
    XWWHUDQFHWH[W
    +HOOR

    XWWHUDQFHODQJ
    HQ86

    XWWHUDQFHUDWH
    XWWHUDQFHDGG(YHQW/LVWHQHU
    HQG
    IXQFWLRQ^
    FRQVROHORJ
    6SHHFKFRPSOHWHG

    `
    VSHHFK6\QWKHVLVVSHDNXWWHUDQFH

    View full-size slide

  39. SPEECH SYNTHESIS: DEMO IT MAN!
    I know my voice isn't very sexy, but I still want to say that this
    conference is wonderful and the audience of my talk is even
    better. You all rock!
    This demo can be found at https://jsbin.com/cipepa/watch?output

    View full-size slide

  40. HOW I DID IT

    View full-size slide

  41. INTERACTIVE FORM: RECIPE
    Promises (to avoid the callback hell)
    Speech recognition
    Speech synthesis
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View full-size slide

  42. INTERACTIVE FORM: STEP 1 - HTML
    IRUPLG IRUP!
    ODEHOIRU QDPH
    GDWDTXHVWLRQ :KDW
    V\RXUQDPH"!1DPHODEHO!
    LQSXWLG QDPH!
    ODEHOIRU VXUQDPH
    GDWDTXHVWLRQ :KDW
    V\RXUVXUQDPH"!6XUQDPHODEHO!
    LQSXWLG VXUQDPH!
    2WKHUODEHOHOHPHQWSDLUVKHUH!
    LQSXWLG EWQYRLFHW\SH VXEPLWYDOXH 6WDUW!
    IRUP!

    View full-size slide

  43. INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY
    Create a ’‡‡…Š object containing two methods: •’‡ƒ and
    ”‡…‘‰‹œ‡ that return a ”‘‹•‡.
    YDU6SHHFK ^
    VSHDNIXQFWLRQWH[W^
    UHWXUQQHZ3URPLVHIXQFWLRQUHVROYHUHMHFW^`
    `
    UHFRJQL]HIXQFWLRQ^
    UHWXUQQHZ3URPLVHIXQFWLRQUHVROYHUHMHFW^`
    `
    `

    View full-size slide

  44. INTERACTIVE FORM: STEP 3 - JS 1/2
    IXQFWLRQIRUP'DWDSURPLVHODEHO^
    UHWXUQSURPLVHWKHQIXQFWLRQ^
    UHWXUQ6SHHFKVSHDNODEHOGDWDVHWTXHVWLRQ
    `WKHQIXQFWLRQ^
    UHWXUQ6SHHFKUHFRJQL]H
    `
    WKHQIXQFWLRQWH[W^
    YDUILHOG ODEHOJHW$WWULEXWH
    IRU

    GRFXPHQWJHW(OHPHQW%\,GILHOGYDOXH WH[W
    UHWXUQ3URPLVHUHVROYHWH[W
    `
    `

    View full-size slide

  45. INTERACTIVE FORM: STEP 3 - JS 2/2
    EXWWRQDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQHYHQW^
    YDUILHOG/DEHOV GRFXPHQWTXHU\6HOHFWRU$OO
    ODEHO

    YDUSURPLVH 3URPLVHUHVROYH
    IXQFWLRQIRUP'DWDODEHO^ FRGHKHUH `
    IRUYDUL LILHOG/DEHOVOHQJWKL^
    SURPLVH IRUP'DWDSURPLVHILHOG/DEHOV>L@
    `
    SURPLVHWKHQIXQFWLRQ^
    UHWXUQ6SHHFKVSHDN
    7KDQN\RXIRUILOOLQJ

    `FDWFKIXQFWLRQHUURU^
    DOHUWHUURU
    `
    `

    View full-size slide

  46. DICTATION: RECIPE
    Speech recognition
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View full-size slide

  47. DICTATION: STEP 1 - HTML
    GLYLG WUDQVFULSWLRQFRQWHQWHGLWDEOH WUXH!GLY!
    EXWWRQLG EWQVWDUW!6WDUWEXWWRQ!
    EXWWRQLG EWQVWRS!6WRSEXWWRQ!

    View full-size slide

  48. DICTATION: STEP 2 - JS 1/3
    YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ
    UHFRJQL]HULQWHULP5HVXOWV WUXH
    UHFRJQL]HUFRQWLQXRXV WUXH
    YDUWUDQVFU GRFXPHQWJHW(OHPHQW%\,G
    WUDQVFULSWLRQ

    YDUFXUU7UDQVFU GRFXPHQWFUHDWH(OHPHQW
    VSDQ

    FXUU7UDQVFULG
    FXUUHQWWUDQVFULSWLRQ

    View full-size slide

  49. DICTATION: STEP 2 - JS 2/3
    UHFRJQL]HUDGG(YHQW/LVWHQHU
    UHVXOW
    IXQFWLRQHYHQW^
    FXUU7UDQVFUWH[W&RQWHQW


    YDUL HYHQWUHVXOW,QGH[
    ZKLOHLHYHQWUHVXOWVOHQJWK^
    YDUUHVXOW HYHQWUHVXOWV>L@
    LIUHVXOWLV)LQDO^
    WUDQVFUUHPRYH&KLOGFXUU7UDQVFU
    WUDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW
    WUDQVFUDSSHQG&KLOGFXUU7UDQVFU
    `HOVH^
    FXUU7UDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW
    `
    `
    `

    View full-size slide

  50. DICTATION: STEP 2 - JS 3/3
    YDUEWQ6WDUW GRFXPHQWJHW(OHPHQW%\,G
    EWQVWDUW

    EWQ6WDUWDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQ^
    WUDQVFUWH[W&RQWHQW


    WUDQVFUDSSHQG&KLOGFXUU7UDQVFU
    UHFRJQL]HUVWDUW
    `
    YDUEWQ6WRS GRFXPHQWJHW(OHPHQW%\,G
    EWQVWRS

    EWQ6WRSDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQ^
    UHFRJQL]HUVWRS
    `

    View full-size slide

  51. ONE LAST DEMO...
    Video courtesy of Szymon Nowak ( ):
    @szimek https://www.youtube.com/watch?v=R8ejjVAZweg

    View full-size slide

  52. CONTACTS
    Website:
    Email:
    Twitter:
    www.audero.it
    [email protected]
    @AurelioDeRosa

    View full-size slide