Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking and listening to web pages - Topconf Tallinn 2015

Talking and listening to web pages - Topconf Tallinn 2015

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.

Aurelio De Rosa

November 18, 2015
Tweet

More Decks by Aurelio De Rosa

Other Decks in Programming

Transcript

  1. TALKING AND LISTENING TO WEB PAGES
    Aurelio De Rosa
    Tallinn, Estonia - 18 November 2015

    View Slide

  2. WEB DEVELOPER
    CONTRIBUTE(D) TO
    ...
    jQuery
    CanIUse
    GitHub.js
    PureCSS
    WRITE(D) FOR
    ...
    SitePoint
    Tuts+
    .NET magazine
    php [architect]
    Telerik

    View Slide

  3. AUTHORED BOOKS
    JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS
    (Shameless self-promotion!)

    View Slide

  4. View Slide

  5. WHAT WE'LL COVER
    Natural language processing (NLP)
    Why it matters
    The Web Speech API
    Speech recognition
    Speech synthesis
    Issues and inconsistencies
    Demo

    View Slide

  6. NATURAL LANGUAGE
    PROCESSING (NLP)
    A field of computer science, artificial intelligence, and linguistics
    concerned with the interactions between computers and human
    (natural) languages.

    View Slide

  7. NATURAL LANGUAGE PROCESSING (NLP)
    It all started in 1950 when Alan Turing published an article titled
    “Computing Machinery and Intelligence” where he proposed
    what is now called the Turing test.

    View Slide

  8. View Slide

  9. IT'S NOT ALL ABOUT TEXT

    View Slide

  10. ONCE UPON A TIME...

    View Slide

  11. VOICEXML
    It's an XML language for writing Web pages you interact with by
    listening to spoken prompts and other forms of audio that you
    can control by providing spoken inputs.
    Specifications: http://www.w3.org/TR/voicexml30/

    View Slide

  12. VOICEXML: EXAMPLE
    "[POYHUVLRQ HQFRGLQJ ,62"!
    Y[POYHUVLRQ ODQJ HQ!
    IRUP!
    ILHOGQDPH FLW\!
    SURPSW!:KHUHGR\RXZDQWWRWUDYHOWR"SURPSW!
    RSWLRQ!1HZRSWLRQ!/RQGRQRSWLRQ!
    RSWLRQ!7RN\RRSWLRQ!
    ILHOG!
    EORFN!
    VXEPLWQH[W KWWSZZZWHVWFRPQDPHOLVW FLW\!
    EORFN!
    IRUP!
    Y[PO!

    View Slide

  13. JAVA APPLET
    It's an application written in Java and delivered to users in the
    form of bytecode through a web page. The applet is then
    executed within a Java Virtual Machine (JVM) in a process
    separated from the browser itself.

    View Slide

  14. WHY I CARE

    View Slide

  15. WHY YOU SHOULD CARE
    A step ahead to fill the gap with native apps
    Improve user experience
    Feature needed by some applications such as navigators
    Help people with disabilities

    View Slide

  16. “DEMO IT OR IT DIDN'T HAPPEN”™
    Register to our website
    Name:
    Surname:
    Nationality:
    6WDUW
    This demo can be found at https://jsbin.com/faguji/watch?output

    View Slide

  17. View Slide

  18. WEB SPEECH API
    The Web Speech API allows you to deal with two aspects of the
    computer-human interaction: Automatic Speech Recognition
    (ASR) and Text-to-Speech (TTS).
    Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

    View Slide

  19. WEB SPEECH API
    Introduced at the end of 2012
    Defines two interfaces: one for recognition and one for
    synthesis
    Requires the permission before acquiring audio
    Agnostic of the underlying technology

    View Slide

  20. SPEECH RECOGNITION
    There are two types of recognition available: one-shot and
    continuous. The first stops as soon as the user stops talking, the
    second must be stopped programmatically.
    To instantiate a new speech recognizer you have to call
    ’‡‡…Š‡…‘‰‹–‹‘ſƀ:
    YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ

    View Slide

  21. SPEECH RECOGNITION: BROWSERS SUPPORT
    IE/Edge Chrome Safari Firefox Opera
    None 25+ (-webkit) None None None
    Data updated to 17th November 2015

    View Slide

  22. SPEECH RECOGNITION: PROPERTIES
    …‘–‹—‘—•
    ‰”ƒƒ”•*
    ‹–‡”‹‡•—Ž–•
    Žƒ‰
    ƒšŽ–‡”ƒ–‹˜‡•
    •‡”˜‹…‡
    *Up to chrome 46 adding a grammar to the ‰”ƒƒ”• property does nothing. This happens because The group
    is currently discussing options for which grammar formats should be supported, how builtin grammar types are
    specified, and default grammars when not specified.

    View Slide

  23. SPEECH RECOGNITION: METHODS
    •–ƒ”–ſƀ
    •–‘’ſƀ
    ƒ„‘”–ſƀ

    View Slide

  24. SPEECH RECOGNITION: EVENTS
    •–ƒ”–
    ‡†Ƌ
    ƒ—†‹‘•–ƒ”–
    ƒ—†‹‘‡†
    •‘—†•–ƒ”–
    •‘—†‡†
    •’‡‡…Š•–ƒ”–
    •’‡‡…Š‡†
    ”‡•—Ž–
    ‘ƒ–…Š
    ‡””‘”
    *Up to chrome 46 on Windows doesn't fire the ”‡•—Ž– or the ‡””‘” event before the ‡† event when only
    noises are produced (issue ).
    #428873

    View Slide

  25. “IT'S SHOWTIME!”
    6WDUW 6WRS
    This demo can be found at https://jsbin.com/zesew/watch?output

    View Slide

  26. SPEECH RECOGNITION: RESULTS
    Results are obtained as an object (that implements the
    ’‡‡…Š‡…‘‰‹–‹‘˜‡– interface) passed as the first
    argument of the handler attached to the ”‡•—Ž– event.

    View Slide

  27. PROBLEM: SOMETIMES RECOGNITION SUCKS!
    Imagine a user of your website or web app says a command but
    the recognizer returns the wrong string. Your system is good and
    it asks the user to repeat it, but the recognition fails again.
    How can you get out of this loop?

    View Slide

  28. (IDEAL) SOLUTION: GRAMMARS

    View Slide

  29. SOLUTION: LEVENSHTEIN DISTANCE
    An approach that isn't ideal but that you can use today.

    View Slide

  30. LEVENSHTEIN DISTANCE: EXAMPLE
    Commands available: "Send email", "Call"
    Names in the phonebook: "Aurelio De Rosa", "Annarita
    Tranfici", "John Doe"
    Recognized text:
    Updated text:
    6WDUW
    This demo can be found at https://jsbin.com/tevogu/watch?output

    View Slide

  31. View Slide

  32. SPEECH SYNTHESIS
    Provides text-to-speech functionality in the browser. This is
    especially useful for blind people and those with visual
    impairments in general.
    The feature is exposed via a •’‡‡…Š›–Š‡•‹• object that
    possess static methods.

    View Slide

  33. SPEECH SYNTHESIS: BROWSERS SUPPORT
    IE/Edge Chrome Safari Firefox Opera
    None 33+ 7+ 31+ (behind a
    flag)
    27+
    Data updated to 17th November 2015

    View Slide

  34. SPEECH SYNTHESIS: PROPERTIES
    ’‡†‹‰
    •’‡ƒ‹‰
    ’ƒ—•‡†* **
    *Up to chrome 46, pausing the utterance doesn't reflect in a change of the ’ƒ—•‡ property (issue )
    #425553
    **Up to Opera 33 when calling the ’ƒ—•‡ſƀ/”‡•—‡ſƀ methods there is a delay in updating the value of the
    ’ƒ—•‡ property which might result in unexpected behaviors (issue #DNA-45911)

    View Slide

  35. SPEECH SYNTHESIS: METHODS
    •’‡ƒſƀ*
    …ƒ…‡Žſƀ
    ’ƒ—•‡ſƀ
    ”‡•—‡ſƀ
    ‰‡–‘‹…‡•ſƀ
    *Up to chrome 46, •’‡ƒſƀ doesn't support SSML and doesn't strip unrecognized tags (issue ).
    #428902

    View Slide

  36. SPEECH SYNTHESIS: VOICES

    View Slide

  37. SPEECH SYNTHESIS: EVENTS
    ˜‘‹…‡•…Šƒ‰‡†

    View Slide

  38. SPEECH SYNTHESIS: UTTERANCE INTERFACE
    The ’‡‡…Š›–Š‡•‹•––‡”ƒ…‡ interface represents the
    utterance (i.e. the text) that will be spoken by the synthesizer.

    View Slide

  39. SPEECH SYNTHESIS: UTTERANCE PROPERTIES
    Žƒ‰
    ’‹–…Š
    ”ƒ–‡*
    –‡š–**
    ˜‘‹…‡
    ˜‘Ž—‡
    *Up to chrome 46, the ”ƒ–‡ property doesn't work correctly (issue )
    #376280
    **Up to chrome 46, the –‡š– property can't be set to an SSML (Speech Synthesis Markup Language)
    document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ).
    #428902

    View Slide

  40. SPEECH SYNTHESIS: UTTERANCE EVENTS
    •–ƒ”–
    ‡†
    ’ƒ—•‡
    ”‡•—‡
    „‘—†ƒ”›*
    ƒ”*
    ‡””‘”
    „‘—†ƒ”› and ƒ” are not supported by any browser because they are fired by the interaction with SSML
    documents.

    View Slide

  41. SHOW ME TEH CODEZ
    To set the text to emit, we can either pass it when instantiating an
    utterance object or set it later using the –‡š– property.

    View Slide

  42. EXAMPLE 1
    YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH
    +HOOR

    XWWHUDQFHODQJ
    HQ86

    XWWHUDQFHUDWH
    XWWHUDQFHDGG(YHQW/LVWHQHU
    HQG
    IXQFWLRQ^
    FRQVROHORJ
    6SHHFKFRPSOHWHG

    `
    VSHHFK6\QWKHVLVVSHDNXWWHUDQFH

    View Slide

  43. EXAMPLE 2
    YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH
    XWWHUDQFHWH[W
    +HOOR

    XWWHUDQFHODQJ
    HQ86

    XWWHUDQFHUDWH
    XWWHUDQFHDGG(YHQW/LVWHQHU
    HQG
    IXQFWLRQ^
    FRQVROHORJ
    6SHHFKFRPSOHWHG

    `
    VSHHFK6\QWKHVLVVSHDNXWWHUDQFH

    View Slide

  44. SPEECH SYNTHESIS: DEMO IT MAN!
    I know my voice isn't very sexy, but I still want to say that this
    conference is wonderful and the audience of my talk is even
    better. You all rock!
    This demo can be found at https://jsbin.com/cipepa/watch?output

    View Slide

  45. HOW I DID IT

    View Slide

  46. INTERACTIVE FORM: RECIPE
    Promises (to avoid the callback hell)
    Speech recognition
    Speech synthesis
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View Slide

  47. INTERACTIVE FORM: STEP 1 - HTML
    IRUPLG IRUP!
    ODEHOIRU QDPH
    GDWDTXHVWLRQ :KDW
    V\RXUQDPH"!1DPHODEHO!
    LQSXWLG QDPH!
    ODEHOIRU VXUQDPH
    GDWDTXHVWLRQ :KDW
    V\RXUVXUQDPH"!6XUQDPHODEHO!
    LQSXWLG VXUQDPH!
    2WKHUODEHOHOHPHQWSDLUVKHUH!
    LQSXWLG EWQYRLFHW\SH VXEPLWYDOXH 6WDUW!
    IRUP!

    View Slide

  48. INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY
    Create a ’‡‡…Š object containing two methods: •’‡ƒ and
    ”‡…‘‰‹œ‡ that return a ”‘‹•‡.
    YDU6SHHFK ^
    VSHDNIXQFWLRQWH[W^
    UHWXUQQHZ3URPLVHIXQFWLRQUHVROYHUHMHFW^`
    `
    UHFRJQL]HIXQFWLRQ^
    UHWXUQQHZ3URPLVHIXQFWLRQUHVROYHUHMHFW^`
    `
    `

    View Slide

  49. INTERACTIVE FORM: STEP 3 - JS 1/2
    IXQFWLRQIRUP'DWDSURPLVHODEHO^
    UHWXUQSURPLVHWKHQIXQFWLRQ^
    UHWXUQ6SHHFKVSHDNODEHOGDWDVHWTXHVWLRQ
    `WKHQIXQFWLRQ^
    UHWXUQ6SHHFKUHFRJQL]H
    `
    WKHQIXQFWLRQWH[W^
    YDUILHOG ODEHOJHW$WWULEXWH
    IRU

    GRFXPHQWJHW(OHPHQW%\,GILHOGYDOXH WH[W
    UHWXUQ3URPLVHUHVROYHWH[W
    `
    `

    View Slide

  50. INTERACTIVE FORM: STEP 3 - JS 2/2
    EXWWRQDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQHYHQW^
    YDUILHOG/DEHOV GRFXPHQWTXHU\6HOHFWRU$OO
    ODEHO

    YDUSURPLVH 3URPLVHUHVROYH
    IXQFWLRQIRUP'DWDODEHO^ FRGHKHUH `
    IRUYDUL LILHOG/DEHOVOHQJWKL^
    SURPLVH IRUP'DWDSURPLVHILHOG/DEHOV>L@
    `
    SURPLVHWKHQIXQFWLRQ^
    UHWXUQ6SHHFKVSHDN
    7KDQN\RXIRUILOOLQJ

    `FDWFKIXQFWLRQHUURU^
    DOHUWHUURU
    `
    `

    View Slide

  51. DICTATION: RECIPE
    Speech recognition
    The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the
    screen.

    View Slide

  52. DICTATION: STEP 1 - HTML
    GLYLG WUDQVFULSWLRQFRQWHQWHGLWDEOH WUXH!GLY!
    EXWWRQLG EWQVWDUW!6WDUWEXWWRQ!
    EXWWRQLG EWQVWRS!6WRSEXWWRQ!

    View Slide

  53. DICTATION: STEP 2 - JS 1/3
    YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ
    UHFRJQL]HULQWHULP5HVXOWV WUXH
    UHFRJQL]HUFRQWLQXRXV WUXH
    YDUWUDQVFU GRFXPHQWJHW(OHPHQW%\,G
    WUDQVFULSWLRQ

    YDUFXUU7UDQVFU GRFXPHQWFUHDWH(OHPHQW
    VSDQ

    FXUU7UDQVFULG
    FXUUHQWWUDQVFULSWLRQ

    View Slide

  54. DICTATION: STEP 2 - JS 2/3
    UHFRJQL]HUDGG(YHQW/LVWHQHU
    UHVXOW
    IXQFWLRQHYHQW^
    FXUU7UDQVFUWH[W&RQWHQW


    YDUL HYHQWUHVXOW,QGH[
    ZKLOHLHYHQWUHVXOWVOHQJWK^
    YDUUHVXOW HYHQWUHVXOWV>L@
    LIUHVXOWLV)LQDO^
    WUDQVFUUHPRYH&KLOGFXUU7UDQVFU
    WUDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW
    WUDQVFUDSSHQG&KLOGFXUU7UDQVFU
    `HOVH^
    FXUU7UDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW
    `
    `
    `

    View Slide

  55. DICTATION: STEP 2 - JS 3/3
    YDUEWQ6WDUW GRFXPHQWJHW(OHPHQW%\,G
    EWQVWDUW

    EWQ6WDUWDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQ^
    WUDQVFUWH[W&RQWHQW


    WUDQVFUDSSHQG&KLOGFXUU7UDQVFU
    UHFRJQL]HUVWDUW
    `
    YDUEWQ6WRS GRFXPHQWJHW(OHPHQW%\,G
    EWQVWRS

    EWQ6WRSDGG(YHQW/LVWHQHU
    FOLFN
    IXQFWLRQ^
    UHFRJQL]HUVWRS
    `

    View Slide

  56. ONE LAST DEMO...
    Video courtesy of Szymon Nowak ( ):
    @szimek https://www.youtube.com/watch?v=R8ejjVAZweg

    View Slide

  57. THANK YOU!

    View Slide

  58. QUESTIONS?

    View Slide

  59. CONTACTS
    Website:
    Email:
    Twitter:
    www.audero.it
    [email protected]
    @AurelioDeRosa

    View Slide