The History and Future of Speaking with Machines

By Matt Buck & Lauren Golembiewski The History and Future
of Speaking with Machines #givevoice

Overview • The history of speaking with machines • The
principles of conversational design • The future of speaking with machines • Demonstration of building a voice interface

THE HISTORY OF SPEAKING WITH MACHINES #givevoice

Talking with Machines Speech recognition Speech synthesis

• c.1219 - c.1292 • English philosopher and Franciscan friar
• Proponent of the scientiﬁc method Roger Bacon

Brazen Heads • Automatons • Not real (obvs) • Could
answer any question put to them https://en.wikipedia.org/wiki/Speech_synthesis#History

1700s: Early speech synthesis • 1779: Christian Gottlieb Kratzenstein models
vocal tract • 1791: Wolfgang von Kempelen’s “acoustic- mechanical speech machine” https://en.wikipedia.org/wiki/ Wolfgang_von_Kempelen's_Speaking_Machine

1846: Euphonia • Created by Joseph Faber • Also played
like an organ • Modeled entire head • Spoke three languages https://irrationalgeographic.wordpress.com/ 2009/06/24/joseph-fabers-talking-euphonia/

https://irrationalgeographic.wordpress.com/2009/06/24/joseph-fabers-talking-euphonia/ Alexander Graham Bell

https://www.ﬂickr.com/photos/ 64416865@N00/5451405887/in/photostream/

99% Invisible EP. 208 “VOX EX MACHINA” http://99percentinvisible.org/episode/vox-ex-machina/

https://www.youtube.com/watch?v=0rAyrmm7vv0 1940: Voder World’s Fair Demo http://hackaday.com/2014/08/12/retrotechtacular-the-voder-from-bell-labs/

https://www.youtube.com/watch?v=41U78QP8nBk 1961: The First Machine to Sing http://speechstones.com/milestones.html

https://www.youtube.com/watch?v=OuEN5TjYRCE 2001: Still Singing http://speechstones.com/milestones.html

• One of the most inﬂuential engineers of the 20th
century • Inventor of the term “transistor” • “There are strong reasons for believing that spoken English is… not recognizable phoneme by phoneme or word by word.” 1969: John Robinson Pierce https://en.wikipedia.org/wiki/Speech_recognition#History

• Used TI’s Solid State Speech • First toy to
use speech that was synthesized 1978: The Texas Instruments Speak & Spell https://commons.wikimedia.org/wiki/File:TI_SpeakSpell_no_shadow.jpg

1978: The TI Speak & Spell https://www.youtube.com/watch?v=qM8FcN0aAvU

• Interest in speech recognition reignited by DARPA grants in
early 70s • Tangora: A voice-activated word processor with a 20,000 word vocabulary 1986: IBM Tangora

Interactive Voice Response

• Integrated with core Apple apps + Wolfram Alpha •
Solved real problems 2011: Siri

• Neural network capable of generating speech • Can mimic
any human voice • Reduces gap in performance by over 50% 2016: Google WaveNet https://deepmind.com/blog/wavenet-generative-model-raw-audio/

• Ubiquitous voice interfaces • Always-on, react to “wake words”
• Siri-like functionality • Control smart-home devices 2014-2016: Amazon Echo & Google Home

Tea. Earl Grey. Hot.

The History of Speaking with Machines • Roger Bacon’s Brazen
Head • Early attempts at speech synthesis • The 20th century: • Demonstration of building a voice interface

THE PRINCIPLES OF VOICE INTERFACE DESIGN #givevoice

The Core Technology The Voice User Interface Automatic Speech Recognition
(ASR) Natural Language Understanding (NLU) Bot Intelligence Text to Speech (TTS)

The Core Technology Automatic Speech Recognition (ASR) Takes the spoken
word and turns it into text.

The Core Technology Natural Language Understanding (NLU) Gives the text
meaning by turning it into structured data

The Core Technology Bot Intelligence Bot Intelligence manages the context
of the user, the application, and the conversation.

The Core Technology Text to Speech (TTS) The result generated
by the intelligence is spoken back to the user.

User Experience   Principles of VUIs • Define the job
your bot will do • Map the conversation flow • Help users create a mental model • Define how the machine handles the job

What job is your bot being hired to do? USER
EXPERIENCE OF VUIS https://www.intercom.com/books/jobs-to-be-done

Model the Conversation Flow “Show me flights leaving Atlanta next
Friday after 4pm.” Sure thing! Would you like to fly first class, business class, or coach? First class Classy! And would you like a meal on this flight? Perfect! Here are a list of flights that match your criteria. Yes, of course. find flights first class first class, yes to meal departure city: Austin

Give Structure to your Bot https://www.nngroup.com/articles/mental-models/ implementation model user’s mental
model representation model

Booking a Flight and a Hotel • Can user’s book
a flight and a hotel with your product? • Is it only flights? If so, what if I have a problem with my flight can I contact your bot?

Structure into Machine Language “Show me ﬂights from Austin to
Atlanta leaving next Friday   after 4pm.” departure city destination city date time INTENT: FLIGHTSEARCH

User Interface Elements of VUIs • Words, Words, Words •
SSML - Speech Synthesis Markup Language • Media • Hardware

Words, Words, Words • Write for speaking and listening •
Create a script and act it out

Voice, Tone, and Persona • Voice is the quality of
your words • Tone is how the words are modulated for diﬀerent situations • Persona is the character that embodies the voice and tone

Speech Synthesis Markup Language (SSML) • Style • Emphasis •
Breaks • Prosody https://www.w3.org/TR/speech-synthesis/

Media & Hardware • Audio • Screens and Sensors •
Hardware

THE FUTURE OF SPEAKING WITH MACHINES #givevoice

1. More emotional awareness 2. Social attitudes toward voice 3.
Better integration of voice and visual 4. Improvements in underlying tech THE FUTURE OF SPEAKING WITH MACHINES

HOW TO BUILD A VOICE INTERFACE IN TEN MINUTES #givevoice

The History and Future of Speaking with Machines

The History and Future of Speaking with Machines

More Decks by Voxable

Other Decks in Technology

Featured

Transcript