Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Professional Voice AI with Vocode - PyData DE 2024

Building Professional Voice AI with Vocode - PyData DE 2024

Dive into the world of AI voice agents with Vocode, the leading framework for creating interactive, voice-based AI assistants. In this talk, we'll explore how Vocode integrates speech-to-text, response generation, and speech synthesis APIs to create agents that not only speak but also understand and adapt to the nuances of human conversation. We'll discuss the challenges of teaching these agents the etiquette of real conversations, such as knowing when to pause, not interrupt, and conclude interactions. Plus, we'll showcase Vocode's LLM function-calling feature through a practical example: real-time appointment booking. Join us to uncover the secrets behind building AI voice agents that are as engaging and efficient as they are innovative.

Lev Konstantinovskiy

April 23, 2024
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. AGENDA 1. Demo 2. What is it good for? 3.

    How does it work? 4. What is missing?
  2. Demo Prompt Please tell me why Python is the best

    programming language ever for highly-parallell low latency applications. For example, AI Voice Agents :)
  3. Business Use of AI Voice Agents • Book an appointment

    with someone • Follow-up after a missed appointment • First job interview
  4. Actively developed Apr-Aug 2023 by vocodehq company. Now maintained by

    @arpagon Many thanks for keeping it open source
  5. Telephone Network Speech to Text LLM Text to Speech Twilio,

    Vonage Whisper, Deepgram Mixtral, Llama, Openai Coqui, ElevenLabs
  6. Receive Audio Send to Speech to text API Orchestration is

    hard Human Talking? Wait For Transcript Create new reply Stop Bot Talking Synthesize voice Send one second of voice over phone
  7. When To End The Call? Text Embeddings GOODBYE_PHRASES = [

    "bye", "goodbye", "see you", "see you later", "talk to you later", "talk to you soon", "have a good day", "have a good night", ]
  8. When To Start Replying? Wait for 10 milliseconds of silence.

    (Deepgram default) Doesn’t work well. Need a separate text+audio model.
  9. Do Something Instead of Replying Call an API to book

    an appointment Solved by LLM Function Calling Works sometimes. Even GPT-4 is struggling.
  10. Cons of Vocode - Missing functionality - Interruption prevention -

    Noise Cancelling - Voicemail detection - Asyncio - Stepping on the hidden rakes all the time Great for prototypes, needs more work for prod.