Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking and listening to web pages - Topconf Ta...

Talking and listening to web pages - Topconf Tallinn 2015

As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.

In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.

Aurelio De Rosa

November 18, 2015
Tweet

More Decks by Aurelio De Rosa

Other Decks in Programming

Transcript

  1. WEB DEVELOPER CONTRIBUTE(D) TO ... jQuery CanIUse GitHub.js PureCSS WRITE(D)

    FOR ... SitePoint Tuts+ .NET magazine php [architect] Telerik
  2. WHAT WE'LL COVER Natural language processing (NLP) Why it matters

    The Web Speech API Speech recognition Speech synthesis Issues and inconsistencies Demo
  3. NATURAL LANGUAGE PROCESSING (NLP) A field of computer science, artificial

    intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.
  4. NATURAL LANGUAGE PROCESSING (NLP) It all started in 1950 when

    Alan Turing published an article titled “Computing Machinery and Intelligence” where he proposed what is now called the Turing test.
  5. VOICEXML It's an XML language for writing Web pages you

    interact with by listening to spoken prompts and other forms of audio that you can control by providing spoken inputs. Specifications: http://www.w3.org/TR/voicexml30/
  6. VOICEXML: EXAMPLE "[POYHUVLRQ HQFRGLQJ ,62"! Y[POYHUVLRQ ODQJ HQ! IRUP! ILHOGQDPH

    FLW\! SURPSW!:KHUHGR\RXZDQWWRWUDYHOWR"SURPSW! RSWLRQ!1HZ<RUNRSWLRQ! RSWLRQ!/RQGRQRSWLRQ! RSWLRQ!7RN\RRSWLRQ! ILHOG! EORFN! VXEPLWQH[W KWWSZZZWHVWFRPQDPHOLVW FLW\! EORFN! IRUP! Y[PO!
  7. JAVA APPLET It's an application written in Java and delivered

    to users in the form of bytecode through a web page. The applet is then executed within a Java Virtual Machine (JVM) in a process separated from the browser itself.
  8. WHY YOU SHOULD CARE A step ahead to fill the

    gap with native apps Improve user experience Feature needed by some applications such as navigators Help people with disabilities
  9. “DEMO IT OR IT DIDN'T HAPPEN”™ Register to our website

    Name: Surname: Nationality: 6WDUW This demo can be found at https://jsbin.com/faguji/watch?output
  10. WEB SPEECH API The Web Speech API allows you to

    deal with two aspects of the computer-human interaction: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html
  11. WEB SPEECH API Introduced at the end of 2012 Defines

    two interfaces: one for recognition and one for synthesis Requires the permission before acquiring audio Agnostic of the underlying technology
  12. SPEECH RECOGNITION There are two types of recognition available: one-shot

    and continuous. The first stops as soon as the user stops talking, the second must be stopped programmatically. To instantiate a new speech recognizer you have to call ’‡‡…Š‡…‘‰‹–‹‘ſƀ: YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ 
  13. SPEECH RECOGNITION: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None

    25+ (-webkit) None None None Data updated to 17th November 2015
  14. SPEECH RECOGNITION: PROPERTIES …‘–‹—‘—• ‰”ƒƒ”•* ‹–‡”‹‡•—Ž–• Žƒ‰ ƒšŽ–‡”ƒ–‹˜‡• •‡”˜‹…‡ *Up

    to chrome 46 adding a grammar to the ‰”ƒƒ”• property does nothing. This happens because The group is currently discussing options for which grammar formats should be supported, how builtin grammar types are specified, and default grammars when not specified.
  15. SPEECH RECOGNITION: EVENTS •–ƒ”– ‡†Ƌ ƒ—†‹‘•–ƒ”– ƒ—†‹‘‡† •‘—†•–ƒ”– •‘—†‡† •’‡‡…Š•–ƒ”–

    •’‡‡…Š‡† ”‡•—Ž– ‘ƒ–…Š ‡””‘” *Up to chrome 46 on Windows doesn't fire the ”‡•—Ž– or the ‡””‘” event before the ‡† event when only noises are produced (issue ). #428873
  16. “IT'S SHOWTIME!” 6WDUW 6WRS This demo can be found at

    https://jsbin.com/zesew/watch?output
  17. SPEECH RECOGNITION: RESULTS Results are obtained as an object (that

    implements the ’‡‡…Š‡…‘‰‹–‹‘˜‡– interface) passed as the first argument of the handler attached to the ”‡•—Ž– event.
  18. PROBLEM: SOMETIMES RECOGNITION SUCKS! Imagine a user of your website

    or web app says a command but the recognizer returns the wrong string. Your system is good and it asks the user to repeat it, but the recognition fails again. How can you get out of this loop?
  19. LEVENSHTEIN DISTANCE: EXAMPLE Commands available: "Send email", "Call" Names in

    the phonebook: "Aurelio De Rosa", "Annarita Tranfici", "John Doe" Recognized text: Updated text: 6WDUW This demo can be found at https://jsbin.com/tevogu/watch?output
  20. SPEECH SYNTHESIS Provides text-to-speech functionality in the browser. This is

    especially useful for blind people and those with visual impairments in general. The feature is exposed via a •’‡‡…Š›–Š‡•‹• object that possess static methods.
  21. SPEECH SYNTHESIS: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None

    33+ 7+ 31+ (behind a flag) 27+ Data updated to 17th November 2015
  22. SPEECH SYNTHESIS: PROPERTIES ’‡†‹‰ •’‡ƒ‹‰ ’ƒ—•‡†* ** *Up to chrome

    46, pausing the utterance doesn't reflect in a change of the ’ƒ—•‡ property (issue ) #425553 **Up to Opera 33 when calling the ’ƒ—•‡ſƀ/”‡•—‡ſƀ methods there is a delay in updating the value of the ’ƒ—•‡ property which might result in unexpected behaviors (issue #DNA-45911)
  23. SPEECH SYNTHESIS: METHODS •’‡ƒſƀ* …ƒ…‡Žſƀ ’ƒ—•‡ſƀ ”‡•—‡ſƀ ‰‡–‘‹…‡•ſƀ *Up to

    chrome 46, •’‡ƒſƀ doesn't support SSML and doesn't strip unrecognized tags (issue ). #428902
  24. SPEECH SYNTHESIS: UTTERANCE PROPERTIES Žƒ‰ ’‹–…Š ”ƒ–‡* –‡š–** ˜‘‹…‡ ˜‘Ž—‡

    *Up to chrome 46, the ”ƒ–‡ property doesn't work correctly (issue ) #376280 **Up to chrome 46, the –‡š– property can't be set to an SSML (Speech Synthesis Markup Language) document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ). #428902
  25. SPEECH SYNTHESIS: UTTERANCE EVENTS •–ƒ”– ‡† ’ƒ—•‡ ”‡•—‡ „‘—†ƒ”›* ƒ”*

    ‡””‘” „‘—†ƒ”› and ƒ” are not supported by any browser because they are fired by the interaction with SSML documents.
  26. SHOW ME TEH CODEZ To set the text to emit,

    we can either pass it when instantiating an utterance object or set it later using the –‡š– property.
  27. EXAMPLE 1 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH +HOOR  XWWHUDQFHODQJ  HQ86 

    XWWHUDQFHUDWH  XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG  `  VSHHFK6\QWKHVLVVSHDN XWWHUDQFH 
  28. EXAMPLE 2 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH  XWWHUDQFHWH[W  +HOOR  XWWHUDQFHODQJ

     HQ86  XWWHUDQFHUDWH  XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG  `  VSHHFK6\QWKHVLVVSHDN XWWHUDQFH 
  29. SPEECH SYNTHESIS: DEMO IT MAN! I know my voice isn't

    very sexy, but I still want to say that this conference is wonderful and the audience of my talk is even better. You all rock! This demo can be found at https://jsbin.com/cipepa/watch?output
  30. INTERACTIVE FORM: RECIPE Promises (to avoid the callback hell) Speech

    recognition Speech synthesis The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the screen.
  31. INTERACTIVE FORM: STEP 1 - HTML IRUPLG IRUP! ODEHOIRU QDPH

    GDWDTXHVWLRQ :KDW V\RXUQDPH"!1DPHODEHO! LQSXWLG QDPH! ODEHOIRU VXUQDPH GDWDTXHVWLRQ :KDW V\RXUVXUQDPH"!6XUQDPHODEHO! LQSXWLG VXUQDPH! 2WKHUODEHOHOHPHQWSDLUVKHUH! LQSXWLG EWQYRLFHW\SH VXEPLWYDOXH 6WDUW! IRUP!
  32. INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY Create a ’‡‡…Š

    object containing two methods: •’‡ƒ and ”‡…‘‰‹œ‡ that return a ”‘‹•‡. YDU6SHHFK ^ VSHDNIXQFWLRQ WH[W ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` UHFRJQL]HIXQFWLRQ ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` `
  33. INTERACTIVE FORM: STEP 3 - JS 1/2 IXQFWLRQIRUP'DWD SURPLVHODEHO ^

    UHWXUQSURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN ODEHOGDWDVHWTXHVWLRQ  ` WKHQ IXQFWLRQ ^ UHWXUQ6SHHFKUHFRJQL]H  ` WKHQ IXQFWLRQ WH[W ^ YDUILHOG ODEHOJHW$WWULEXWH IRU  GRFXPHQWJHW(OHPHQW%\,G ILHOG YDOXH WH[W UHWXUQ3URPLVHUHVROYH WH[W  `  `
  34. INTERACTIVE FORM: STEP 3 - JS 2/2 EXWWRQDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ

    HYHQW ^ YDUILHOG/DEHOV GRFXPHQWTXHU\6HOHFWRU$OO ODEHO  YDUSURPLVH 3URPLVHUHVROYH  IXQFWLRQIRUP'DWD ODEHO ^ FRGHKHUH ` IRU YDUL LILHOG/DEHOVOHQJWKL ^ SURPLVH IRUP'DWD SURPLVHILHOG/DEHOV>L@  ` SURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN 7KDQN\RXIRUILOOLQJ  ` FDWFK IXQFWLRQ HUURU ^ DOHUW HUURU  `  ` 
  35. DICTATION: RECIPE Speech recognition The actual code is a bit

    different but I made the changes for the sake of brevity and the limited size of the screen.
  36. DICTATION: STEP 2 - JS 1/3 YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ  UHFRJQL]HULQWHULP5HVXOWV

    WUXH UHFRJQL]HUFRQWLQXRXV WUXH YDUWUDQVFU GRFXPHQWJHW(OHPHQW%\,G WUDQVFULSWLRQ  YDUFXUU7UDQVFU GRFXPHQWFUHDWH(OHPHQW VSDQ  FXUU7UDQVFULG  FXUUHQWWUDQVFULSWLRQ 
  37. DICTATION: STEP 2 - JS 2/3 UHFRJQL]HUDGG(YHQW/LVWHQHU UHVXOW IXQFWLRQ HYHQW

    ^ FXUU7UDQVFUWH[W&RQWHQW   YDUL HYHQWUHVXOW,QGH[ ZKLOH LHYHQWUHVXOWVOHQJWK ^ YDUUHVXOW HYHQWUHVXOWV>L@ LI UHVXOWLV)LQDO ^ WUDQVFUUHPRYH&KLOG FXUU7UDQVFU  WUDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW WUDQVFUDSSHQG&KLOG FXUU7UDQVFU  `HOVH^ FXUU7UDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW ` ` ` 
  38. DICTATION: STEP 2 - JS 3/3 YDUEWQ6WDUW GRFXPHQWJHW(OHPHQW%\,G EWQVWDUW 

    EWQ6WDUWDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ WUDQVFUWH[W&RQWHQW   WUDQVFUDSSHQG&KLOG FXUU7UDQVFU  UHFRJQL]HUVWDUW  `  YDUEWQ6WRS GRFXPHQWJHW(OHPHQW%\,G EWQVWRS  EWQ6WRSDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ UHFRJQL]HUVWRS  ` 
  39. ONE LAST DEMO... Video courtesy of Szymon Nowak ( ):

    @szimek https://www.youtube.com/watch?v=R8ejjVAZweg