Overview • The history of speaking with machines • Voice on the web • The future of speaking with machines • Demonstration of building a voice interface
1700s: Early speech synthesis • 1779: Christian Gottlieb Kratzenstein models vocal tract • 1791: Wolfgang von Kempelen’s “acoustic- mechanical speech machine” https://en.wikipedia.org/wiki/ Wolfgang_von_Kempelen's_Speaking_Machine
1846: Euphonia • Created by Joseph Faber • Also played like an organ • Modeled entire head • Spoke three languages https://irrationalgeographic.wordpress.com/ 2009/06/24/joseph-fabers-talking-euphonia/
• One of the most influential engineers of the 20th century • Inventor of the term “transistor” • “There are strong reasons for believing that spoken English is… not recognizable phoneme by phoneme or word by word.” 1969: John Robinson Pierce https://en.wikipedia.org/wiki/Speech_recognition#History
• Used TI’s Solid State Speech • First toy to use speech that was synthesized 1978: The Texas Instruments Speak & Spell https://commons.wikimedia.org/wiki/File:TI_SpeakSpell_no_shadow.jpg
• Interest in speech recognition reignited by DARPA grants in early 70s • Tangora: A voice-activated word processor with a 20,000 word vocabulary 1986: IBM Tangora
• Neural network capable of generating speech • Can mimic any human voice • Reduces gap in performance by over 50% 2016: Google WaveNet https://deepmind.com/blog/wavenet-generative-model-raw-audio/
The Core Technology The Voice User Interface Automatic Speech Recognition (ASR) Natural Language Understanding (NLU) Bot Intelligence Text to Speech (TTS)
Designing a Voice Interface • Determine intents of the system • Define entities for those intents • Model the conversational flow • Scripting and read-through
Determine Intents • Intents: goals the user wishes to accomplish • I need to run this report • I need to find this person’s phone number • Create a list of your user’s goals
Determine Entities “Show me flights from Austin to Atlanta leaving next Friday after 4pm.” departure city destination city date time INTENT: FLIGHTSEARCH
Model the Conversation Flow “Show me flights Atlanta leaving next Friday after 4pm.” Sure thing! Would you like to fly first class, business class, or coach? First class Classy! And would you like a meal on this flight? Perfect! Here are a list of flights that match your criteria. Yes, of course. find flights first class first class, yes to meal departure city: Austin
The pitfalls of keyword matching Well, I don’t need help with the schedule, but where are the restrooms? Hello there! Welcome to the conference. You can say: “show me a map”, “schedule”, or “contact organizers.” Great! Here’s the schedule…
Natural Language Understanding • Can be built from scratch, but it’s expensive • Existing platforms • API.AI (Google) • Wit.ai (Facebook) • LUIS (Microsoft) • Watson Cognitive Services (IBM)
https://github.com/voxable-labs/expando (is it possible to|can I|how do I) return (something|an item) is it possible to return something is it possible to return an item can I return something can I return an item how do I return something how do I return an item Expando