Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The History and Future of Speaking with Machines

Voxable
March 14, 2017

The History and Future of Speaking with Machines

Ubiquitous voice interfaces like the Google Home and Amazon Alexa became available only recently. The history of humanity's attempts to converse with machines goes back centuries, however. We review this history of the effort to give machines voice. This analysis will provide context to the core concepts of modern conversational design, like Natural Language Processing and Voice User Experience. We introduce design and development concepts for building voice interfaces that work.

Voxable

March 14, 2017
Tweet

More Decks by Voxable

Other Decks in Technology

Transcript

  1. By Matt Buck & Lauren Golembiewski The History and Future

    of Speaking with Machines #givevoice
  2. Overview • The history of speaking with machines • The

    principles of conversational design • The future of speaking with machines • Demonstration of building a voice interface
  3. • c.1219 - c.1292 • English philosopher and Franciscan friar

    • Proponent of the scientific method Roger Bacon
  4. Brazen Heads • Automatons • Not real (obvs) • Could

    answer any question put to them https://en.wikipedia.org/wiki/Speech_synthesis#History
  5. 1700s: Early speech synthesis • 1779: Christian Gottlieb Kratzenstein models

    vocal tract • 1791: Wolfgang von Kempelen’s “acoustic- mechanical speech machine” https://en.wikipedia.org/wiki/ Wolfgang_von_Kempelen's_Speaking_Machine
  6. 1846: Euphonia • Created by Joseph Faber • Also played

    like an organ • Modeled entire head • Spoke three languages https://irrationalgeographic.wordpress.com/ 2009/06/24/joseph-fabers-talking-euphonia/
  7. • One of the most influential engineers of the 20th

    century • Inventor of the term “transistor” • “There are strong reasons for believing that spoken English is… not recognizable phoneme by phoneme or word by word.” 1969: John Robinson Pierce https://en.wikipedia.org/wiki/Speech_recognition#History
  8. • Used TI’s Solid State Speech • First toy to

    use speech that was synthesized 1978: The Texas Instruments Speak & Spell https://commons.wikimedia.org/wiki/File:TI_SpeakSpell_no_shadow.jpg
  9. • Interest in speech recognition reignited by DARPA grants in

    early 70s • Tangora: A voice-activated word processor with a 20,000 word vocabulary 1986: IBM Tangora
  10. • Neural network capable of generating speech • Can mimic

    any human voice • Reduces gap in performance by over 50% 2016: Google WaveNet https://deepmind.com/blog/wavenet-generative-model-raw-audio/
  11. • Ubiquitous voice interfaces • Always-on, react to “wake words”

    • Siri-like functionality • Control smart-home devices 2014-2016: Amazon Echo & Google Home
  12. The History of Speaking with Machines • Roger Bacon’s Brazen

    Head • Early attempts at speech synthesis • The 20th century: • Demonstration of building a voice interface
  13. The Core Technology The Voice User Interface Automatic Speech Recognition

    (ASR) Natural Language Understanding (NLU) Bot Intelligence Text to Speech (TTS)
  14. The Core Technology Bot Intelligence Bot Intelligence manages the context

    of the user, the application, and the conversation.
  15. The Core Technology Text to Speech (TTS) The result generated

    by the intelligence is spoken back to the user.
  16. User Experience 
 Principles of VUIs • Define the job

    your bot will do • Map the conversation flow • Help users create a mental model • Define how the machine handles the job
  17. What job is your bot being hired to do? USER

    EXPERIENCE OF VUIS https://www.intercom.com/books/jobs-to-be-done
  18. Model the Conversation Flow “Show me flights leaving Atlanta next

    Friday after 4pm.” Sure thing! Would you like to fly first class, business class, or coach? First class Classy! And would you like a meal on this flight? Perfect! Here are a list of flights that match your criteria. Yes, of course. find flights first class first class, yes to meal departure city: Austin
  19. Booking a Flight and a Hotel • Can user’s book

    a flight and a hotel with your product? • Is it only flights? If so, what if I have a problem with my flight can I contact your bot?
  20. Structure into Machine Language “Show me flights from Austin to

    Atlanta leaving next Friday 
 after 4pm.” departure city destination city date time INTENT: FLIGHTSEARCH
  21. User Interface Elements of VUIs • Words, Words, Words •

    SSML - Speech Synthesis Markup Language • Media • Hardware
  22. Voice, Tone, and Persona • Voice is the quality of

    your words • Tone is how the words are modulated for different situations • Persona is the character that embodies the voice and tone
  23. Speech Synthesis Markup Language (SSML) • Style • Emphasis •

    Breaks • Prosody https://www.w3.org/TR/speech-synthesis/
  24. 1. More emotional awareness 2. Social attitudes toward voice 3.

    Better integration of voice and visual 4. Improvements in underlying tech THE FUTURE OF SPEAKING WITH MACHINES