Talking and listening to web pages - Topconf Tallinn 2015

TALKING AND LISTENING TO WEB PAGES Aurelio De Rosa Tallinn,
Estonia - 18 November 2015

WEB DEVELOPER CONTRIBUTE(D) TO ... jQuery CanIUse GitHub.js PureCSS WRITE(D)
FOR ... SitePoint Tuts+ .NET magazine php [architect] Telerik

AUTHORED BOOKS JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS
(Shameless self-promotion!)

WHAT WE'LL COVER Natural language processing (NLP) Why it matters
The Web Speech API Speech recognition Speech synthesis Issues and inconsistencies Demo

NATURAL LANGUAGE PROCESSING (NLP) A field of computer science, artificial
intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.

NATURAL LANGUAGE PROCESSING (NLP) It all started in 1950 when
Alan Turing published an article titled “Computing Machinery and Intelligence” where he proposed what is now called the Turing test.

IT'S NOT ALL ABOUT TEXT

ONCE UPON A TIME...

VOICEXML It's an XML language for writing Web pages you
interact with by listening to spoken prompts and other forms of audio that you can control by providing spoken inputs. Specifications: http://www.w3.org/TR/voicexml30/

VOICEXML: EXAMPLE "[POYHUVLRQ HQFRGLQJ ,62"! Y[POYHUVLRQ ODQJ HQ! IRUP! ILHOGQDPH
FLW\! SURPSW!:KHUHGR\RXZDQWWRWUDYHOWR"SURPSW! RSWLRQ!1HZ<RUNRSWLRQ! RSWLRQ!/RQGRQRSWLRQ! RSWLRQ!7RN\RRSWLRQ! ILHOG! EORFN! VXEPLWQH[W KWWSZZZWHVWFRPQDPHOLVW FLW\! EORFN! IRUP! Y[PO!

JAVA APPLET It's an application written in Java and delivered
to users in the form of bytecode through a web page. The applet is then executed within a Java Virtual Machine (JVM) in a process separated from the browser itself.

WHY I CARE

WHY YOU SHOULD CARE A step ahead to fill the
gap with native apps Improve user experience Feature needed by some applications such as navigators Help people with disabilities

“DEMO IT OR IT DIDN'T HAPPEN”™ Register to our website
Name: Surname: Nationality: 6WDUW This demo can be found at https://jsbin.com/faguji/watch?output

WEB SPEECH API The Web Speech API allows you to
deal with two aspects of the computer-human interaction: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

WEB SPEECH API Introduced at the end of 2012 Defines
two interfaces: one for recognition and one for synthesis Requires the permission before acquiring audio Agnostic of the underlying technology

SPEECH RECOGNITION There are two types of recognition available: one-shot
and continuous. The first stops as soon as the user stops talking, the second must be stopped programmatically. To instantiate a new speech recognizer you have to call ſƀ: YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ

SPEECH RECOGNITION: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None
25+ (-webkit) None None None Data updated to 17th November 2015

SPEECH RECOGNITION: PROPERTIES * *Up
to chrome 46 adding a grammar to the property does nothing. This happens because The group is currently discussing options for which grammar formats should be supported, how builtin grammar types are specified, and default grammars when not specified.

SPEECH RECOGNITION: METHODS ſƀ ſƀ ſƀ

SPEECH RECOGNITION: EVENTS Ƌ
*Up to chrome 46 on Windows doesn't fire the or the event before the event when only noises are produced (issue ). #428873

“IT'S SHOWTIME!” 6WDUW 6WRS This demo can be found at
https://jsbin.com/zesew/watch?output

SPEECH RECOGNITION: RESULTS Results are obtained as an object (that
implements the interface) passed as the first argument of the handler attached to the event.

PROBLEM: SOMETIMES RECOGNITION SUCKS! Imagine a user of your website
or web app says a command but the recognizer returns the wrong string. Your system is good and it asks the user to repeat it, but the recognition fails again. How can you get out of this loop?

(IDEAL) SOLUTION: GRAMMARS

SOLUTION: LEVENSHTEIN DISTANCE An approach that isn't ideal but that
you can use today.

LEVENSHTEIN DISTANCE: EXAMPLE Commands available: "Send email", "Call" Names in
the phonebook: "Aurelio De Rosa", "Annarita Tranfici", "John Doe" Recognized text: Updated text: 6WDUW This demo can be found at https://jsbin.com/tevogu/watch?output

SPEECH SYNTHESIS Provides text-to-speech functionality in the browser. This is
especially useful for blind people and those with visual impairments in general. The feature is exposed via a object that possess static methods.

SPEECH SYNTHESIS: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None
33+ 7+ 31+ (behind a flag) 27+ Data updated to 17th November 2015

SPEECH SYNTHESIS: PROPERTIES * ** *Up to chrome
46, pausing the utterance doesn't reflect in a change of the property (issue ) #425553 **Up to Opera 33 when calling the ſƀ/ſƀ methods there is a delay in updating the value of the property which might result in unexpected behaviors (issue #DNA-45911)

SPEECH SYNTHESIS: METHODS ſƀ* ſƀ ſƀ ſƀ ſƀ *Up to
chrome 46, ſƀ doesn't support SSML and doesn't strip unrecognized tags (issue ). #428902

SPEECH SYNTHESIS: VOICES

SPEECH SYNTHESIS: EVENTS

SPEECH SYNTHESIS: UTTERANCE INTERFACE The interface represents the utterance
(i.e. the text) that will be spoken by the synthesizer.

SPEECH SYNTHESIS: UTTERANCE PROPERTIES * **
*Up to chrome 46, the property doesn't work correctly (issue ) #376280 **Up to chrome 46, the property can't be set to an SSML (Speech Synthesis Markup Language) document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ). #428902

SPEECH SYNTHESIS: UTTERANCE EVENTS * *
and are not supported by any browser because they are fired by the interaction with SSML documents.

SHOW ME TEH CODEZ To set the text to emit,
we can either pass it when instantiating an utterance object or set it later using the property.

EXAMPLE 1 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH +HOOR XWWHUDQFHODQJ HQ86
XWWHUDQFHUDWH XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG ` VSHHFK6\QWKHVLVVSHDN XWWHUDQFH

EXAMPLE 2 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH XWWHUDQFHWH[W +HOOR XWWHUDQFHODQJ
HQ86 XWWHUDQFHUDWH XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG ` VSHHFK6\QWKHVLVVSHDN XWWHUDQFH

SPEECH SYNTHESIS: DEMO IT MAN! I know my voice isn't
very sexy, but I still want to say that this conference is wonderful and the audience of my talk is even better. You all rock! This demo can be found at https://jsbin.com/cipepa/watch?output

HOW I DID IT

INTERACTIVE FORM: RECIPE Promises (to avoid the callback hell) Speech
recognition Speech synthesis The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the screen.

INTERACTIVE FORM: STEP 1 - HTML IRUPLG IRUP! ODEHOIRU QDPH
GDWDTXHVWLRQ :KDW V\RXUQDPH"!1DPHODEHO! LQSXWLG QDPH! ODEHOIRU VXUQDPH GDWDTXHVWLRQ :KDW V\RXUVXUQDPH"!6XUQDPHODEHO! LQSXWLG VXUQDPH! 2WKHUODEHOHOHPHQWSDLUVKHUH! LQSXWLG EWQYRLFHW\SH VXEPLWYDOXH 6WDUW! IRUP!

INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY Create a
object containing two methods: and that return a . YDU6SHHFK ^ VSHDNIXQFWLRQ WH[W ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` UHFRJQL]HIXQFWLRQ ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` `

INTERACTIVE FORM: STEP 3 - JS 1/2 IXQFWLRQIRUP'DWD SURPLVHODEHO ^
UHWXUQSURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN ODEHOGDWDVHWTXHVWLRQ ` WKHQ IXQFWLRQ ^ UHWXUQ6SHHFKUHFRJQL]H ` WKHQ IXQFWLRQ WH[W ^ YDUILHOG ODEHOJHW$WWULEXWH IRU GRFXPHQWJHW(OHPHQW%\,G ILHOG YDOXH WH[W UHWXUQ3URPLVHUHVROYH WH[W ` `

INTERACTIVE FORM: STEP 3 - JS 2/2 EXWWRQDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ
HYHQW ^ YDUILHOG/DEHOV GRFXPHQWTXHU\6HOHFWRU$OO ODEHO YDUSURPLVH 3URPLVHUHVROYH IXQFWLRQIRUP'DWD ODEHO ^ FRGHKHUH ` IRU YDUL LILHOG/DEHOVOHQJWKL ^ SURPLVH IRUP'DWD SURPLVHILHOG/DEHOV>L@ ` SURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN 7KDQN\RXIRUILOOLQJ ` FDWFK IXQFWLRQ HUURU ^ DOHUW HUURU ` `

DICTATION: RECIPE Speech recognition The actual code is a bit
different but I made the changes for the sake of brevity and the limited size of the screen.

DICTATION: STEP 1 - HTML GLYLG WUDQVFULSWLRQFRQWHQWHGLWDEOH WUXH!GLY! EXWWRQLG EWQVWDUW!6WDUWEXWWRQ!
EXWWRQLG EWQVWRS!6WRSEXWWRQ!

DICTATION: STEP 2 - JS 1/3 YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ UHFRJQL]HULQWHULP5HVXOWV
WUXH UHFRJQL]HUFRQWLQXRXV WUXH YDUWUDQVFU GRFXPHQWJHW(OHPHQW%\,G WUDQVFULSWLRQ YDUFXUU7UDQVFU GRFXPHQWFUHDWH(OHPHQW VSDQ FXUU7UDQVFULG FXUUHQWWUDQVFULSWLRQ

DICTATION: STEP 2 - JS 2/3 UHFRJQL]HUDGG(YHQW/LVWHQHU UHVXOW IXQFWLRQ HYHQW
^ FXUU7UDQVFUWH[W&RQWHQW YDUL HYHQWUHVXOW,QGH[ ZKLOH LHYHQWUHVXOWVOHQJWK ^ YDUUHVXOW HYHQWUHVXOWV>L@ LI UHVXOWLV)LQDO ^ WUDQVFUUHPRYH&KLOG FXUU7UDQVFU WUDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW WUDQVFUDSSHQG&KLOG FXUU7UDQVFU `HOVH^ FXUU7UDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW ` ` `

DICTATION: STEP 2 - JS 3/3 YDUEWQ6WDUW GRFXPHQWJHW(OHPHQW%\,G EWQVWDUW
EWQ6WDUWDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ WUDQVFUWH[W&RQWHQW WUDQVFUDSSHQG&KLOG FXUU7UDQVFU UHFRJQL]HUVWDUW ` YDUEWQ6WRS GRFXPHQWJHW(OHPHQW%\,G EWQVWRS EWQ6WRSDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ UHFRJQL]HUVWRS `

ONE LAST DEMO... Video courtesy of Szymon Nowak ( ):
@szimek https://www.youtube.com/watch?v=R8ejjVAZweg

THANK YOU!

QUESTIONS?

CONTACTS Website: Email: Twitter: www.audero.it [email protected] @AurelioDeRosa

Talking and listening to web pages - Topconf Ta...

Talking and listening to web pages - Topconf Tallinn 2015

More Decks by Aurelio De Rosa

Other Decks in Programming

Featured

Transcript