Input Types
Touch: Manipulate space
Keyboard: Data entry
Voice: Command or query
Slide 5
Slide 5 text
55% of teens
41% of adults
use voice search every day*
*maybe
Slide 6
Slide 6 text
10% of Baidu
search queries are by voice
That’s ~500m per day
Slide 7
Slide 7 text
To Talk:
Speak
Listen
Understand
Slide 8
Slide 8 text
Synthesis
Slide 9
Slide 9 text
Chrome/Safari
v
a
r t
x
t = '
H
e
l
l
o w
o
r
l
d
'
,
s
a
y = n
e
w S
p
e
e
c
h
S
y
n
t
h
e
s
i
s
U
t
t
e
r
a
n
c
e
(
t
x
t
)
;
w
i
n
d
o
w
.
s
p
e
e
c
h
S
y
n
t
h
e
s
i
s
.
s
p
e
a
k
(
s
a
y
)
;
The Web Speech A.P.I. Play
Slide 10
Slide 10 text
SSU Attributes
v
a
r t
x
t = '
H
e
l
l
o w
o
r
l
d
'
,
s
a
y = n
e
w S
p
e
e
c
h
S
y
n
t
h
e
s
i
s
U
t
t
e
r
a
n
c
e
(
t
x
t
)
;
s
a
y
.
l
a
n
g = '
e
n
-
G
B
'
;
s
a
y
.
p
i
t
c
h = 0
.
7
5
;
s
a
y
.
r
a
t
e = 1
.
5
;
s
a
y
.
v
o
l
u
m
e = 0
.
5
;
w
i
n
d
o
w
.
s
p
e
e
c
h
S
y
n
t
h
e
s
i
s
.
s
p
e
a
k
(
s
a
y
)
;
The Web Speech A.P.I. Play
Slide 11
Slide 11 text
Methods, Attrs, Events
Play / Pause / Resume /
Cancel / End / Error
Slide 12
Slide 12 text
Synthesis As A Service
1.
2.
3.
4.
developer.att.com/apis/speech
ws.neospeech.com/
cereproc.com/en/products/cloud
ivona.com/en/for-business/speech-cloud/
Slide 13
Slide 13 text
Neospeech
The Web Speech A.P.I. 1
Slide 14
Slide 14 text
SSML
<
s
p
e
a
k v
e
r
s
i
o
n
=
"
1
.
0
"
>
<
p
>
<
s
>
T
h
i
s i
s <
a
b
b
r
>
S
.
S
.
M
.
L
.
<
/
a
b
b
r
>
<
/
s
>
<
s
>
S
p
e
a
k <
p
r
o
s
o
d
y r
a
t
e
=
"
-
2
0
%
"
>
s
l
o
w
l
y
<
/
p
r
o
s
o
d
y
>
.
<
/
s
>
<
/
p
>
<
/
s
p
e
a
k
>
Slide 15
Slide 15 text
Recognition
Slide 16
Slide 16 text
Challenges
1. Multiple users
2. Multiple languages
3. Accents
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Web Speech API
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
x-browser
v
a
r s
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n = (
w
i
n
d
o
w
.
S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n |
|
w
i
n
d
o
w
.
w
e
b
k
i
t
S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
)
;
v
a
r r
e
c
o
g = n
e
w s
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
Slide 19
Slide 19 text
SpeechRecognition Methods
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
r
e
c
o
g
.
s
t
a
r
t
(
)
;
r
e
c
o
g
.
s
t
o
p
(
)
;
r
e
c
o
g
.
a
b
o
r
t
(
)
;
Slide 20
Slide 20 text
SpeechRecognition Events
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
r
e
c
o
g
.
o
n
r
e
s
u
l
t = f
u
n
c
t
i
o
n (
) {
}
;
r
e
c
o
g
.
o
n
n
o
m
a
t
c
h = f
u
n
c
t
i
o
n (
) {
}
;
r
e
c
o
g
.
o
n
e
r
r
o
r = f
u
n
c
t
i
o
n (
) {
}
;
Slide 21
Slide 21 text
MVS
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
r
e
c
o
g
.
o
n
r
e
s
u
l
t = f
u
n
c
t
i
o
n (
r
e
s
u
l
t
) {
o
u
t
p
u
t
.
t
e
x
t
C
o
n
t
e
n
t = r
e
s
u
l
t
s
[
0
]
[
0
]
.
t
r
a
n
s
c
r
i
p
t
;
}
;
b
t
n
.
o
n
c
l
i
c
k = r
e
c
o
g
.
s
t
a
r
t
(
)
;
Click header for demo
Slide 22
Slide 22 text
SpeechRecognition Events
start
audiostart
soundstart
speechstart
speechend
soundend
audioend
end
Click header for demo
Slide 23
Slide 23 text
Interim Results
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
r
e
c
o
g
.
i
n
t
e
r
i
m
R
e
s
u
l
t
s = t
r
u
e
;
r
e
c
o
g
.
o
n
r
e
s
u
l
t = f
u
n
c
t
i
o
n (
r
e
s
u
l
t
) {
i
f (
r
e
s
u
l
t
.
r
e
s
u
l
t
s
[
0
]
.
i
s
F
i
n
a
l
) {
…
}
}
;
b
t
n
.
o
n
c
l
i
c
k = r
e
c
o
g
.
s
t
a
r
t
(
)
;
Click header for demo
Slide 24
Slide 24 text
Continuous
v
a
r r
e
c
o
g = n
e
w S
p
e
e
c
h
R
e
c
o
g
n
i
t
i
o
n
(
)
;
r
e
c
o
g
.
c
o
n
t
i
n
u
o
u
s = t
r
u
e
;
r
e
c
o
g
.
o
n
r
e
s
u
l
t = f
u
n
c
t
i
o
n (
r
e
s
u
l
t
) {
o
u
t
p
u
t
.
t
e
x
t
C
o
n
t
e
n
t = r
e
s
u
l
t
.
r
e
s
u
l
t
s
[
0
]
[
0
]
.
t
r
a
n
s
c
r
i
p
t
;
}
;
b
t
n
.
o
n
c
l
i
c
k = f
u
n
c
t
i
o
n (
) {
i
f (
l
i
s
t
e
n
i
n
g
) { r
e
c
o
g
.
s
t
o
p
(
)
; }
e
l
s
e { r
e
c
o
g
.
s
t
a
r
t
(
)
; }
}
Slide 25
Slide 25 text
SpeechRTC +
Web Speech API
Slide 26
Slide 26 text
Tea. Earl Grey. Hot.
Slide 27
Slide 27 text
[Google demo]
Slide 28
Slide 28 text
JuliusJS
v
a
r r
e
c
o
g = n
e
w J
u
l
i
u
s
(
)
;
r
e
c
o
g
.
o
n
r
e
c
o
g
n
i
t
i
o
n = f
u
n
c
t
i
o
n (
r
e
s
u
l
t
) {
c
o
n
s
o
l
e
.
l
o
g
(
r
e
s
u
l
t
)
;
}
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
Wit + Node
1. Web Speech API + text
2. Direct speech: GuM, Web Audio API
Slide 33
Slide 33 text
Wit.ai
Microphone.js
Slide 34
Slide 34 text
[Wit dashboard demo]
Slide 35
Slide 35 text
Wit.ai
Response
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
Use Cases
No screen
Busy hands
Slide 38
Slide 38 text
Amazon : ‘Alexa’
Apple : Siri
Google : Voice Search
Microsoft : Cortana
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
Closed
Systems :(
Slide 43
Slide 43 text
The End
Thanks for your
patience.
Slide 44
Slide 44 text
Copyright Note
The video clips in this presentation from the films Moon, Star
Trek: The Voyage Home, Her, Star Trek, and 2001: A Space
Odyssey belong to their respective copyright holders and are
used here without permission.