The History and Future of Speaking with Machines

by Voxable

Slide 1

Slide 1 text

By Matt Buck & Lauren Golembiewski The History and Future of Speaking with Machines #givevoice

Slide 2

Slide 2 text

Overview • The history of speaking with machines • The principles of conversational design • The future of speaking with machines • Demonstration of building a voice interface

Slide 3

Slide 3 text

THE HISTORY OF SPEAKING WITH MACHINES #givevoice

Slide 4

Slide 4 text

Talking with Machines Speech recognition Speech synthesis

Slide 5

Slide 5 text

• c.1219 - c.1292 • English philosopher and Franciscan friar • Proponent of the scientiﬁc method Roger Bacon

Slide 6

Slide 6 text

Brazen Heads • Automatons • Not real (obvs) • Could answer any question put to them https://en.wikipedia.org/wiki/Speech_synthesis#History

Slide 7

Slide 7 text

1700s: Early speech synthesis • 1779: Christian Gottlieb Kratzenstein models vocal tract • 1791: Wolfgang von Kempelen’s “acoustic- mechanical speech machine” https://en.wikipedia.org/wiki/ Wolfgang_von_Kempelen's_Speaking_Machine

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

1846: Euphonia • Created by Joseph Faber • Also played like an organ • Modeled entire head • Spoke three languages https://irrationalgeographic.wordpress.com/ 2009/06/24/joseph-fabers-talking-euphonia/

Slide 10

Slide 10 text

https://irrationalgeographic.wordpress.com/2009/06/24/joseph-fabers-talking-euphonia/ Alexander Graham Bell

Slide 11

Slide 11 text

https://www.ﬂickr.com/photos/ 64416865@N00/5451405887/in/photostream/

Slide 12

Slide 12 text

https://www.ﬂickr.com/photos/ 64416865@N00/5451405887/in/photostream/

Slide 13

Slide 13 text

https://www.ﬂickr.com/photos/ 64416865@N00/5451405887/in/photostream/

Slide 14

Slide 14 text

99% Invisible EP. 208 “VOX EX MACHINA” http://99percentinvisible.org/episode/vox-ex-machina/

Slide 15

Slide 15 text

https://www.youtube.com/watch?v=0rAyrmm7vv0 1940: Voder World’s Fair Demo http://hackaday.com/2014/08/12/retrotechtacular-the-voder-from-bell-labs/

Slide 16

Slide 16 text

https://www.youtube.com/watch?v=41U78QP8nBk 1961: The First Machine to Sing http://speechstones.com/milestones.html

Slide 17

Slide 17 text

https://www.youtube.com/watch?v=OuEN5TjYRCE 2001: Still Singing http://speechstones.com/milestones.html

Slide 18

Slide 18 text

• One of the most inﬂuential engineers of the 20th century • Inventor of the term “transistor” • “There are strong reasons for believing that spoken English is… not recognizable phoneme by phoneme or word by word.” 1969: John Robinson Pierce https://en.wikipedia.org/wiki/Speech_recognition#History

Slide 19

Slide 19 text

• Used TI’s Solid State Speech • First toy to use speech that was synthesized 1978: The Texas Instruments Speak & Spell https://commons.wikimedia.org/wiki/File:TI_SpeakSpell_no_shadow.jpg

Slide 20

Slide 20 text

1978: The TI Speak & Spell https://www.youtube.com/watch?v=qM8FcN0aAvU

Slide 21

Slide 21 text

• Interest in speech recognition reignited by DARPA grants in early 70s • Tangora: A voice-activated word processor with a 20,000 word vocabulary 1986: IBM Tangora

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

I V R

Slide 24

Slide 24 text

Interactive Voice Response

Slide 25

Slide 25 text

• Integrated with core Apple apps + Wolfram Alpha • Solved real problems 2011: Siri

Slide 26

Slide 26 text

• Neural network capable of generating speech • Can mimic any human voice • Reduces gap in performance by over 50% 2016: Google WaveNet https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Slide 27

Slide 27 text

• Ubiquitous voice interfaces • Always-on, react to “wake words” • Siri-like functionality • Control smart-home devices 2014-2016: Amazon Echo & Google Home

Slide 28

Slide 28 text

Tea. Earl Grey. Hot.

Slide 29

Slide 29 text

The History of Speaking with Machines • Roger Bacon’s Brazen Head • Early attempts at speech synthesis • The 20th century: • Demonstration of building a voice interface

Slide 30

Slide 30 text

THE PRINCIPLES OF VOICE INTERFACE DESIGN #givevoice

Slide 31

Slide 31 text

The Core Technology The Voice User Interface Automatic Speech Recognition (ASR) Natural Language Understanding (NLU) Bot Intelligence Text to Speech (TTS)

Slide 32

Slide 32 text

The Core Technology Automatic Speech Recognition (ASR) Takes the spoken word and turns it into text.

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

The Core Technology Natural Language Understanding (NLU) Gives the text meaning by turning it into structured data

Slide 35

Slide 35 text

The Core Technology Bot Intelligence Bot Intelligence manages the context of the user, the application, and the conversation.

Slide 36

Slide 36 text

The Core Technology Text to Speech (TTS) The result generated by the intelligence is spoken back to the user.

Slide 37

Slide 37 text

User Experience   Principles of VUIs • Define the job your bot will do • Map the conversation flow • Help users create a mental model • Define how the machine handles the job

Slide 38

Slide 38 text

What job is your bot being hired to do? USER EXPERIENCE OF VUIS https://www.intercom.com/books/jobs-to-be-done

Slide 39

Slide 39 text

Model the Conversation Flow “Show me flights leaving Atlanta next Friday after 4pm.” Sure thing! Would you like to fly first class, business class, or coach? First class Classy! And would you like a meal on this flight? Perfect! Here are a list of flights that match your criteria. Yes, of course. find flights first class first class, yes to meal departure city: Austin

Slide 40

Slide 40 text

Give Structure to your Bot https://www.nngroup.com/articles/mental-models/ implementation model user’s mental model representation model

Slide 41

Slide 41 text

Booking a Flight and a Hotel • Can user’s book a flight and a hotel with your product? • Is it only flights? If so, what if I have a problem with my flight can I contact your bot?

Slide 42

Slide 42 text

Structure into Machine Language “Show me ﬂights from Austin to Atlanta leaving next Friday   after 4pm.” departure city destination city date time INTENT: FLIGHTSEARCH

Slide 43

Slide 43 text

User Interface Elements of VUIs • Words, Words, Words • SSML - Speech Synthesis Markup Language • Media • Hardware

Slide 44

Slide 44 text

Words, Words, Words • Write for speaking and listening • Create a script and act it out

Slide 45

Slide 45 text

Voice, Tone, and Persona • Voice is the quality of your words • Tone is how the words are modulated for diﬀerent situations • Persona is the character that embodies the voice and tone

Slide 46

Slide 46 text

Speech Synthesis Markup Language (SSML) • Style • Emphasis • Breaks • Prosody https://www.w3.org/TR/speech-synthesis/

Slide 47

Slide 47 text

Media & Hardware • Audio • Screens and Sensors • Hardware

Slide 48

Slide 48 text

THE FUTURE OF SPEAKING WITH MACHINES #givevoice

Slide 49

Slide 49 text

1. More emotional awareness 2. Social attitudes toward voice 3. Better integration of voice and visual 4. Improvements in underlying tech THE FUTURE OF SPEAKING WITH MACHINES

Slide 50

Slide 50 text

HOW TO BUILD A VOICE INTERFACE IN TEN MINUTES #givevoice