Slide 1

Slide 1 text

TALKING AND LISTENING TO WEB PAGES Aurelio De Rosa Tallinn, Estonia - 18 November 2015

Slide 2

Slide 2 text

WEB DEVELOPER CONTRIBUTE(D) TO ... jQuery CanIUse GitHub.js PureCSS WRITE(D) FOR ... SitePoint Tuts+ .NET magazine php [architect] Telerik

Slide 3

Slide 3 text

AUTHORED BOOKS JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS (Shameless self-promotion!)

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

WHAT WE'LL COVER Natural language processing (NLP) Why it matters The Web Speech API Speech recognition Speech synthesis Issues and inconsistencies Demo

Slide 6

Slide 6 text

NATURAL LANGUAGE PROCESSING (NLP) A field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.

Slide 7

Slide 7 text

NATURAL LANGUAGE PROCESSING (NLP) It all started in 1950 when Alan Turing published an article titled “Computing Machinery and Intelligence” where he proposed what is now called the Turing test.

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

IT'S NOT ALL ABOUT TEXT

Slide 10

Slide 10 text

ONCE UPON A TIME...

Slide 11

Slide 11 text

VOICEXML It's an XML language for writing Web pages you interact with by listening to spoken prompts and other forms of audio that you can control by providing spoken inputs. Specifications: http://www.w3.org/TR/voicexml30/

Slide 12

Slide 12 text

VOICEXML: EXAMPLE "[POYHUVLRQ HQFRGLQJ ,62"! Y[POYHUVLRQ ODQJ HQ! IRUP! ILHOGQDPH FLW\! SURPSW!:KHUHGR\RXZDQWWRWUDYHOWR"SURPSW! RSWLRQ!1HZ

Slide 13

Slide 13 text

JAVA APPLET It's an application written in Java and delivered to users in the form of bytecode through a web page. The applet is then executed within a Java Virtual Machine (JVM) in a process separated from the browser itself.

Slide 14

Slide 14 text

WHY I CARE

Slide 15

Slide 15 text

WHY YOU SHOULD CARE A step ahead to fill the gap with native apps Improve user experience Feature needed by some applications such as navigators Help people with disabilities

Slide 16

Slide 16 text

“DEMO IT OR IT DIDN'T HAPPEN”™ Register to our website Name: Surname: Nationality: 6WDUW This demo can be found at https://jsbin.com/faguji/watch?output

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

WEB SPEECH API The Web Speech API allows you to deal with two aspects of the computer-human interaction: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). Specifications: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

Slide 19

Slide 19 text

WEB SPEECH API Introduced at the end of 2012 Defines two interfaces: one for recognition and one for synthesis Requires the permission before acquiring audio Agnostic of the underlying technology

Slide 20

Slide 20 text

SPEECH RECOGNITION There are two types of recognition available: one-shot and continuous. The first stops as soon as the user stops talking, the second must be stopped programmatically. To instantiate a new speech recognizer you have to call ’‡‡…Ї…‘‰‹–‹‘ſƀ: YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ 

Slide 21

Slide 21 text

SPEECH RECOGNITION: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None 25+ (-webkit) None None None Data updated to 17th November 2015

Slide 22

Slide 22 text

SPEECH RECOGNITION: PROPERTIES …‘–‹—‘—• ‰”ƒƒ”•* ‹–‡”‹‡•—Ž–• Žƒ‰ ƒšŽ–‡”ƒ–‹˜‡• •‡”˜‹…‡ *Up to chrome 46 adding a grammar to the ‰”ƒƒ”• property does nothing. This happens because The group is currently discussing options for which grammar formats should be supported, how builtin grammar types are specified, and default grammars when not specified.

Slide 23

Slide 23 text

SPEECH RECOGNITION: METHODS •–ƒ”–ſƀ •–‘’ſƀ ƒ„‘”–ſƀ

Slide 24

Slide 24 text

SPEECH RECOGNITION: EVENTS •–ƒ”– ‡†Ƌ ƒ—†‹‘•–ƒ”– ƒ—†‹‘‡† •‘—†•–ƒ”– •‘—†‡† •’‡‡…Š•–ƒ”– •’‡‡…Ї† ”‡•—Ž– ‘ƒ–…Š ‡””‘” *Up to chrome 46 on Windows doesn't fire the ”‡•—Ž– or the ‡””‘” event before the ‡† event when only noises are produced (issue ). #428873

Slide 25

Slide 25 text

“IT'S SHOWTIME!” 6WDUW 6WRS This demo can be found at https://jsbin.com/zesew/watch?output

Slide 26

Slide 26 text

SPEECH RECOGNITION: RESULTS Results are obtained as an object (that implements the ’‡‡…Ї…‘‰‹–‹‘˜‡– interface) passed as the first argument of the handler attached to the ”‡•—Ž– event.

Slide 27

Slide 27 text

PROBLEM: SOMETIMES RECOGNITION SUCKS! Imagine a user of your website or web app says a command but the recognizer returns the wrong string. Your system is good and it asks the user to repeat it, but the recognition fails again. How can you get out of this loop?

Slide 28

Slide 28 text

(IDEAL) SOLUTION: GRAMMARS

Slide 29

Slide 29 text

SOLUTION: LEVENSHTEIN DISTANCE An approach that isn't ideal but that you can use today.

Slide 30

Slide 30 text

LEVENSHTEIN DISTANCE: EXAMPLE Commands available: "Send email", "Call" Names in the phonebook: "Aurelio De Rosa", "Annarita Tranfici", "John Doe" Recognized text: Updated text: 6WDUW This demo can be found at https://jsbin.com/tevogu/watch?output

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

SPEECH SYNTHESIS Provides text-to-speech functionality in the browser. This is especially useful for blind people and those with visual impairments in general. The feature is exposed via a •’‡‡…Š›–Ї•‹• object that possess static methods.

Slide 33

Slide 33 text

SPEECH SYNTHESIS: BROWSERS SUPPORT IE/Edge Chrome Safari Firefox Opera None 33+ 7+ 31+ (behind a flag) 27+ Data updated to 17th November 2015

Slide 34

Slide 34 text

SPEECH SYNTHESIS: PROPERTIES ’‡†‹‰ •’‡ƒ‹‰ ’ƒ—•‡†* ** *Up to chrome 46, pausing the utterance doesn't reflect in a change of the ’ƒ—•‡ property (issue ) #425553 **Up to Opera 33 when calling the ’ƒ—•‡ſƀ/”‡•—‡ſƀ methods there is a delay in updating the value of the ’ƒ—•‡ property which might result in unexpected behaviors (issue #DNA-45911)

Slide 35

Slide 35 text

SPEECH SYNTHESIS: METHODS •’‡ƒſƀ* …ƒ…‡Žſƀ ’ƒ—•‡ſƀ ”‡•—‡ſƀ ‰‡–‘‹…‡•ſƀ *Up to chrome 46, •’‡ƒſƀ doesn't support SSML and doesn't strip unrecognized tags (issue ). #428902

Slide 36

Slide 36 text

SPEECH SYNTHESIS: VOICES

Slide 37

Slide 37 text

SPEECH SYNTHESIS: EVENTS ˜‘‹…‡•…Šƒ‰‡†

Slide 38

Slide 38 text

SPEECH SYNTHESIS: UTTERANCE INTERFACE The ’‡‡…Š›–Ї•‹•––‡”ƒ…‡ interface represents the utterance (i.e. the text) that will be spoken by the synthesizer.

Slide 39

Slide 39 text

SPEECH SYNTHESIS: UTTERANCE PROPERTIES Žƒ‰ ’‹–…Š ”ƒ–‡* –‡š–** ˜‘‹…‡ ˜‘Ž—‡ *Up to chrome 46, the ”ƒ–‡ property doesn't work correctly (issue ) #376280 **Up to chrome 46, the –‡š– property can't be set to an SSML (Speech Synthesis Markup Language) document because it isn't supported and Chrome doesn't strip the unrecognized tags (issue ). #428902

Slide 40

Slide 40 text

SPEECH SYNTHESIS: UTTERANCE EVENTS •–ƒ”– ‡† ’ƒ—•‡ ”‡•—‡ „‘—†ƒ”›* ƒ”* ‡””‘” „‘—†ƒ”› and ƒ” are not supported by any browser because they are fired by the interaction with SSML documents.

Slide 41

Slide 41 text

SHOW ME TEH CODEZ To set the text to emit, we can either pass it when instantiating an utterance object or set it later using the –‡š– property.

Slide 42

Slide 42 text

EXAMPLE 1 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH +HOOR  XWWHUDQFHODQJ  HQ86  XWWHUDQFHUDWH  XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG  `  VSHHFK6\QWKHVLVVSHDN XWWHUDQFH 

Slide 43

Slide 43 text

EXAMPLE 2 YDUXWWHUDQFH QHZ6SHHFK6\QWKHVLV8WWHUDQFH  XWWHUDQFHWH[W  +HOOR  XWWHUDQFHODQJ  HQ86  XWWHUDQFHUDWH  XWWHUDQFHDGG(YHQW/LVWHQHU HQG IXQFWLRQ ^ FRQVROHORJ 6SHHFKFRPSOHWHG  `  VSHHFK6\QWKHVLVVSHDN XWWHUDQFH 

Slide 44

Slide 44 text

SPEECH SYNTHESIS: DEMO IT MAN! I know my voice isn't very sexy, but I still want to say that this conference is wonderful and the audience of my talk is even better. You all rock! This demo can be found at https://jsbin.com/cipepa/watch?output

Slide 45

Slide 45 text

HOW I DID IT

Slide 46

Slide 46 text

INTERACTIVE FORM: RECIPE Promises (to avoid the callback hell) Speech recognition Speech synthesis The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the screen.

Slide 47

Slide 47 text

INTERACTIVE FORM: STEP 1 - HTML IRUPLG IRUP! ODEHOIRU QDPH GDWDTXHVWLRQ :KDW V\RXUQDPH"!1DPHODEHO! LQSXWLG QDPH! ODEHOIRU VXUQDPH GDWDTXHVWLRQ :KDW V\RXUVXUQDPH"!6XUQDPHODEHO! LQSXWLG VXUQDPH! 2WKHUODEHOHOHPHQWSDLUVKHUH! LQSXWLG EWQYRLFHW\SH VXEPLWYDOXH 6WDUW! IRUP!

Slide 48

Slide 48 text

INTERACTIVE FORM: STEP 2 - SUPPORT LIBRARY Create a ’‡‡…Š object containing two methods: •’‡ƒ and ”‡…‘‰‹œ‡ that return a ”‘‹•‡. YDU6SHHFK ^ VSHDNIXQFWLRQ WH[W ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` UHFRJQL]HIXQFWLRQ ^ UHWXUQQHZ3URPLVH IXQFWLRQ UHVROYHUHMHFW ^` ` `

Slide 49

Slide 49 text

INTERACTIVE FORM: STEP 3 - JS 1/2 IXQFWLRQIRUP'DWD SURPLVHODEHO ^ UHWXUQSURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN ODEHOGDWDVHWTXHVWLRQ  ` WKHQ IXQFWLRQ ^ UHWXUQ6SHHFKUHFRJQL]H  ` WKHQ IXQFWLRQ WH[W ^ YDUILHOG ODEHOJHW$WWULEXWH IRU  GRFXPHQWJHW(OHPHQW%\,G ILHOG YDOXH WH[W UHWXUQ3URPLVHUHVROYH WH[W  `  `

Slide 50

Slide 50 text

INTERACTIVE FORM: STEP 3 - JS 2/2 EXWWRQDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ HYHQW ^ YDUILHOG/DEHOV GRFXPHQWTXHU\6HOHFWRU$OO ODEHO  YDUSURPLVH 3URPLVHUHVROYH  IXQFWLRQIRUP'DWD ODEHO ^ FRGHKHUH ` IRU YDUL LILHOG/DEHOVOHQJWKL ^ SURPLVH IRUP'DWD SURPLVHILHOG/DEHOV>L@  ` SURPLVHWKHQ IXQFWLRQ ^ UHWXUQ6SHHFKVSHDN 7KDQN\RXIRUILOOLQJ  ` FDWFK IXQFWLRQ HUURU ^ DOHUW HUURU  `  ` 

Slide 51

Slide 51 text

DICTATION: RECIPE Speech recognition The actual code is a bit different but I made the changes for the sake of brevity and the limited size of the screen.

Slide 52

Slide 52 text

DICTATION: STEP 1 - HTML GLYLG WUDQVFULSWLRQFRQWHQWHGLWDEOH WUXH!GLY! EXWWRQLG EWQVWDUW!6WDUWEXWWRQ! EXWWRQLG EWQVWRS!6WRSEXWWRQ!

Slide 53

Slide 53 text

DICTATION: STEP 2 - JS 1/3 YDUUHFRJQL]HU QHZ6SHHFK5HFRJQLWLRQ  UHFRJQL]HULQWHULP5HVXOWV WUXH UHFRJQL]HUFRQWLQXRXV WUXH YDUWUDQVFU GRFXPHQWJHW(OHPHQW%\,G WUDQVFULSWLRQ  YDUFXUU7UDQVFU GRFXPHQWFUHDWH(OHPHQW VSDQ  FXUU7UDQVFULG  FXUUHQWWUDQVFULSWLRQ 

Slide 54

Slide 54 text

DICTATION: STEP 2 - JS 2/3 UHFRJQL]HUDGG(YHQW/LVWHQHU UHVXOW IXQFWLRQ HYHQW ^ FXUU7UDQVFUWH[W&RQWHQW   YDUL HYHQWUHVXOW,QGH[ ZKLOH LHYHQWUHVXOWVOHQJWK ^ YDUUHVXOW HYHQWUHVXOWV>L@ LI UHVXOWLV)LQDO ^ WUDQVFUUHPRYH&KLOG FXUU7UDQVFU  WUDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW WUDQVFUDSSHQG&KLOG FXUU7UDQVFU  `HOVH^ FXUU7UDQVFUWH[W&RQWHQW UHVXOW>@WUDQVFULSW ` ` ` 

Slide 55

Slide 55 text

DICTATION: STEP 2 - JS 3/3 YDUEWQ6WDUW GRFXPHQWJHW(OHPHQW%\,G EWQVWDUW  EWQ6WDUWDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ WUDQVFUWH[W&RQWHQW   WUDQVFUDSSHQG&KLOG FXUU7UDQVFU  UHFRJQL]HUVWDUW  `  YDUEWQ6WRS GRFXPHQWJHW(OHPHQW%\,G EWQVWRS  EWQ6WRSDGG(YHQW/LVWHQHU FOLFN IXQFWLRQ ^ UHFRJQL]HUVWRS  ` 

Slide 56

Slide 56 text

ONE LAST DEMO... Video courtesy of Szymon Nowak ( ): @szimek https://www.youtube.com/watch?v=R8ejjVAZweg

Slide 57

Slide 57 text

THANK YOU!

Slide 58

Slide 58 text

QUESTIONS?

Slide 59

Slide 59 text

CONTACTS Website: Email: Twitter: www.audero.it [email protected] @AurelioDeRosa