Giving voice to your agents, the Symfony AI way

listening… Giving voice to your agents, the Symfony AI way

Role Technical Expert @SensioLabs Focus on Breaking CI's, AI and
code quality Find me @Guikingone github.com/guikingone guikingone.substack.com @Guikingone 02 / 24

03 · The UI problem Your users don't want another
form. They want to speak with your agent the way they talk to their friends. contact_form.html.twig ▢ first name Marie ▢ last name Dupont ▢ email m.dupont@… ▢ phone +33 6 … 12 ▢ reason for contact callback about order ▢ preferred callback slot tomorrow, 3pm ▢ country France ▢ account id 4521 ▢ captcha — /* friction */ 03 / 24

form. They want to speak with your agent the way they talk to their friends. extracted → first name last name email phone reason for contact preferred callback slot country account id captcha chat input Same content, smaller intent but the user still has to type every word and could forget informations. “Hi, Marie Dupont here. I’m calling about an issue with order 4521, any chance of a callback tomorrow around 3pm? You can reach me at 06 12 34 56 78.” Marie Dupont m.dupont@… +33 6 … 12 callback about order tomorrow, 3pm France 4521 — 03 / 24

form. They want to speak with your agent the way they talk to their friends. extracted → first name last name email phone reason for contact preferred callback slot country account id captcha “Hi, Marie Dupont here. I’m calling about an issue with order 4521, any chance of a callback tomorrow around 3pm? You can reach me at 06 12 34 56 78.” Marie Dupont m.dupont@… +33 6 … 12 callback about order tomorrow, 3pm France 4521 — 03 / 24

04 · The promise One API, multiple behaviors 01 One
agent The same domain logic, tools, memory. Nothing duplicated. 02 Two modalities Text most of the time. Voice when available. Same brain, same API. 03 A transparent wrapper STT + TTS glued to the agent you already have. 04 / 24

05 · The shape of an agent TEXT "Hi, Marie
Dupont here. I’m calling about an issue with order 4521, any chance of a callback tomorrow around 3pm?" "Got it, Marie. I’ve booked a callback on order 4521 for tomorrow at 3pm." M LLM gpt-4o · claude · … prompt how it behaves tools what it can do memory what it remembers agent 05 / 24

05 · The shape of an agent TEXT "Hi, Marie
Dupont here. I’m calling about an issue with order 4521, any chance of a callback tomorrow around 3pm?" "Got it, Marie. I’ve booked a callback on order 4521 for tomorrow at 3pm." M LLM gpt-4o · claude · … prompt how it behaves tools what it can do memory what it remembers agent Same agent. Same output. Different behavior. VOICE "Hi, Marie Dupont here. I’m calling about an issue with order 4521, any chance of a callback tomorrow around 3pm?" STT audio → text TTS text → audio "Got it, Marie. I’ve booked a callback on order 4521 for tomorrow at 3pm." M LLM gpt-4o · claude · … prompt how it behaves tools what it can do memory what it remembers agent 05 / 24

06 · Latency is a feature, not a bug Where
the seconds go. user · transcribing… What's my balance? → agent · idle ~ 80 ~ 120 ~ 250 01 · STT running total · 1/3 phases ~450 ms mic → server 80 ms VAD close 120 ms STT 250 ms 02 · AGENT 03 · TTS 06 / 24

the seconds go. user · said What's my balance? → agent · thinking… ~ 80 ~ 120 ~ 250 ~ 400 01 · STT 02 · AGENT running total · 2/3 phases ~850 ms mic → server 80 ms VAD close 120 ms STT 250 ms LLM first token 400 ms 03 · TTS 06 / 24

the seconds go. user · said What's my balance? → agent · speaking Your balance is $1,247. ~ 80 ~ 120 ~ 250 ~ 400 ~ 180 ~ 70 01 · STT 02 · AGENT 03 · TTS running total · 3/3 phases ~1100 ms mic → server 80 ms VAD close 120 ms STT 250 ms LLM first token 400 ms TTS first chunk 180 ms network out 70 ms 06 / 24

the seconds go. user · said What's my balance? → agent · speaking Your balance is $1,247. ~ 80 ~ 120 ~ 250 ~ 400 ~ 180 ~ 70 01 · STT 02 · AGENT 03 · TTS ~1100 ms mic → server 80 ms VAD close 120 ms STT 250 ms LLM first token 400 ms TTS first chunk 180 ms network out 70 ms 06 / 24

07 · Thinking in terms of budget Feels instant Instant
but what about intent? < 500 ms Feels natural The baseline < 1000 ms Feels slow Slow tools calls, latency < 2000 ms User hangs up Network issues, failover, etc > 3000 ms The measure is the first audible syllable, not the full response. The first sound out of the speaker, that's the number the user feels. 07 / 24

08 · Detecting the unknown Voice activity detection threshold 01
Speech start Open the STT socket. 02 Speech end Close after 400ms of silence - The "turn". 03 interrupt If TTS is active when speech starts, cancel & restart. 04 turn closed Hand the transcript to the agent. Wait for the next start. 08 / 24

08 · Detecting the unknown Voice activity detection threshold 01
Speech start Open the STT socket. 02 Speech end Close after 400ms of silence - The "turn". 03 back to 01 interrupt If TTS is active when speech starts, cancel & restart. 04 turn closed Hand the transcript to the agent. Wait for the next start. 08 / 24

09 · barge-in Interrupting is polite. AGENT · TTS speaking…
USER · MIC listening ↓ THE USER STARTS TALKING 1 The user interrupt the agent VAD fires the event 2 The current session is cancelled TTS stream is killed 3 A new turn opens Once the STT / TTS turn is down, the agent answer 09 / 24

09 · barge-in Interrupting is polite. AGENT · TTS cut
USER · MIC speaking… ↓ OPEN THE NEW TURN 1 The user interrupt the agent VAD fires the event 2 The current session is cancelled TTS stream is killed 3 A new turn opens Once the STT / TTS turn is down, the agent answer 09 / 24

09 · barge-in Interrupting is polite. AGENT · TTS resuming…
NEW TURN USER · MIC done Pending TTS bytes are dropped, not buffered 1 The user interrupt the agent VAD fires the event 2 The current session is cancelled TTS stream is killed 3 A new turn opens Once the STT / TTS turn is down, the agent answer 09 / 24

10 · The Symfony approach 01 The agent API must
stay the same The same Symfony you already know. Autowired services, no separate runtime to think about. 02 Configuration as glue Semantic configuration glues STT and TTS together. One change, cache:clear, and continue. 03 Your existing stack Doctrine and Redis for memory. Messenger for async. Monolog for logs and traces. Your stack as it already is. 04 PHP, almost end to end Need real-time feedback? LiveComponents to the rescue, with a sprinkle of JavaScript. 10 / 24

11 · The stack 1 # config/packages/ai.yaml 2 ai: 3
platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 7 agent: 8 support: 9 platform: 'ai.platform.openai' 10 model: 'gpt-4o' symfony/ai-bundle The same shape as messenger or mailer: configuration first, code second. 11 / 24

platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 7 agent: 8 support: 9 platform: 'ai.platform.openai' 10 model: 'gpt-4o' symfony/ai-bundle Declare every provider once under ai.platform. Reusable across agents. 11 / 24

platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 7 agent: 8 support: 9 platform: 'ai.platform.openai' 10 model: 'gpt-4o' symfony/ai-bundle Each agent picks a platform and a model. Autowired by name. 11 / 24

12 · Bootstrapping 1 # ... 2 3 final class
FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $userMessage): string 10 { 11 # ... 12 13 $result = $this->support->call(new MessageBag( 14 Message::ofUser($userMessage), 15 )); 16 17 return $result->asText(); 18 } 19 } Familiar shape. Autowired. Typed. Easy to test. 12 / 24

13 · The configuration 1 # config/packages/ai.yaml 2 ai: 3
platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 9 agent: 10 support: 11 platform: 'ai.platform.openai' 12 model: 'gpt-4o' 13 speech: 14 speech_to_text_platform: 'ai.platform.openai' 15 stt_model: 'whisper' 16 text_to_speech_platform: 'ai.platform.elevenlabs' 17 tts_model: 'eleven_multilingual_v2' 18 tts_options: 19 voice: 'Dslrhjl3ZpzrctukrQSN' symfony/ai-bundle Same config as before. 13 / 24

platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 9 agent: 10 support: 11 platform: 'ai.platform.openai' 12 model: 'gpt-4o' 13 speech: 14 speech_to_text_platform: 'ai.platform.openai' 15 stt_model: 'whisper' 16 text_to_speech_platform: 'ai.platform.elevenlabs' 17 tts_model: 'eleven_multilingual_v2' 18 tts_options: 19 voice: 'Dslrhjl3ZpzrctukrQSN' symfony/ai-bundle Add a second provider for TTS. Reuses the existing configuration. 13 / 24

platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 9 agent: 10 support: 11 platform: 'ai.platform.openai' 12 model: 'gpt-4o' 13 speech: 14 speech_to_text_platform: 'ai.platform.openai' 15 stt_model: 'whisper' 16 text_to_speech_platform: 'ai.platform.elevenlabs' 17 tts_model: 'eleven_multilingual_v2' 18 tts_options: 19 voice: 'Dslrhjl3ZpzrctukrQSN' symfony/ai-bundle The speech key turns $support into a voice agent. STT and TTS can come from different platforms. The bundle decorates your agent with SpeechAgent. Same $support handle, new behavior. 13 / 24

14 · Adding the ears 1 # ... 2 3
final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 return $result->asText(); 19 } 20 } Audio is just content. 14 / 24

final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 return $result->asText(); 19 } 20 } Audio is just content. The input arrives as a base64 string sent by the JavaScript (LiveComponents for example). 14 / 24

final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 return $result->asText(); 19 } 20 } Audio is just content. We write it to a file to retrieve the bytes. A stream would work just as well. 14 / 24

final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 return $result->asText(); 19 } 20 } Audio is just content. Wrap the file in an Audio object and send it through the MessageBag. 14 / 24

15 · Adding the mouth 1 # ... 2 3
final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 $result->asFile('/tmp/answer.mp3'); 19 20 # ... 21 22 return $result->getMetadata()->get('text'); 23 } 24 } 15 / 24

final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 $result->asFile('/tmp/answer.mp3'); 19 20 # ... 21 22 return $result->getMetadata()->get('text'); 23 } 24 } We persist the file containing the answer. A stream would work just as well. 15 / 24

final class FooService 4 { 5 public function __construct( 6 private readonly AgentInterface $support, 7 ) {} 8 9 public function handle(string $base64encodedInput): string 10 { 11 $path = $this->filesystem->tempnam(sys_get_temp_dir(), 'audio-', '.wav'); 12 $this->filesystem->dumpFile($path, base64_decode($base64audio)); 13 14 $result = $this->support->call(new MessageBag( 15 Message::ofUser(Audio::fromFile($path)), 16 )); 17 18 $result->asFile('/tmp/answer.mp3'); 19 20 # ... 21 22 return $result->getMetadata()->get('text'); 23 } 24 } If needed, we could use a Mercure hub to push the content to a LiveComponent. 15 / 24

16 · The whole picture 0/3 1 // 1. wrap
any AgentInterface. same call(), same contract. 2 $voice = new SpeechAgent( 3 agent: $supportAgent, 4 configuration: new SpeechConfiguration( 5 sttModel: 'whisper', 6 ttsModel: 'eleven_multilingual_v2', 7 ttsOptions: ['voice' => 'Dslrhjl3ZpzrctukrQSN'], 8 ), 9 speechToTextPlatform: $openai, 10 textToSpeechPlatform: $elevenlabs, 11 ); 1 // 2. audio in → audio out. tools, memory, prompt: untouched. 2 $result = $voice->call(new MessageBag( 3 Message::ofUser(Audio::fromFile('/tmp/answer.wav')), 4 )); 1 // 3. bytes in the body, the LLM's text rides in metadata. 2 $audio = $result->getContent(); // mp3 bytes 3 $text = $result->getMetadata()->get('text'); // "Your balance is …" 1 Decorator. It wraps any AgentInterface. Same call(), same contract. 2 Audio is content. Voice rides inside UserMessage as Audio. STT only runs when the latest user message has audio. 3 Both legs are optional. STT-only, TTS-only, or both — decided by which platforms + models you hand to SpeechConfiguration. 16 / 24

any AgentInterface. same call(), same contract. 2 $voice = new SpeechAgent( 3 agent: $supportAgent, 4 configuration: new SpeechConfiguration( 5 sttModel: 'whisper', 6 ttsModel: 'eleven_multilingual_v2', 7 ttsOptions: ['voice' => 'Dslrhjl3ZpzrctukrQSN'], 8 ), 9 speechToTextPlatform: $openai, 10 textToSpeechPlatform: $elevenlabs, 11 ); 1 Decorator. It wraps any AgentInterface. Same call(), same contract. 1 // 2. audio in → audio out. tools, memory, prompt: untouched. 2 $result = $voice->call(new MessageBag( 3 Message::ofUser(Audio::fromFile('/tmp/answer.wav')), 4 )); 1 // 3. bytes in the body, the LLM's text rides in metadata. 2 $audio = $result->getContent(); // mp3 bytes 3 $text = $result->getMetadata()->get('text'); // "Your balance is …" 2 Audio is content. Voice rides inside UserMessage as Audio. STT only runs when the latest user message has audio. 3 Both legs are optional. STT-only, TTS-only, or both — decided by which platforms + models you hand to SpeechConfiguration. 16 / 24

any AgentInterface. same call(), same contract. 2 $voice = new SpeechAgent( 3 agent: $supportAgent, 4 configuration: new SpeechConfiguration( 5 sttModel: 'whisper', 6 ttsModel: 'eleven_multilingual_v2', 7 ttsOptions: ['voice' => 'Dslrhjl3ZpzrctukrQSN'], 8 ), 9 speechToTextPlatform: $openai, 10 textToSpeechPlatform: $elevenlabs, 11 ); 1 // 2. audio in → audio out. tools, memory, prompt: untouched. 2 $result = $voice->call(new MessageBag( 3 Message::ofUser(Audio::fromFile('/tmp/answer.wav')), 4 )); 1 Decorator. It wraps any AgentInterface. Same call(), same contract. 2 Audio is content. Voice rides inside UserMessage as Audio. STT only runs when the latest user message has audio. 1 // 3. bytes in the body, the LLM's text rides in metadata. 2 $audio = $result->getContent(); // mp3 bytes 3 $text = $result->getMetadata()->get('text'); // "Your balance is …" 3 Both legs are optional. STT-only, TTS-only, or both — decided by which platforms + models you hand to SpeechConfiguration. 16 / 24

any AgentInterface. same call(), same contract. 2 $voice = new SpeechAgent( 3 agent: $supportAgent, 4 configuration: new SpeechConfiguration( 5 sttModel: 'whisper', 6 ttsModel: 'eleven_multilingual_v2', 7 ttsOptions: ['voice' => 'Dslrhjl3ZpzrctukrQSN'], 8 ), 9 speechToTextPlatform: $openai, 10 textToSpeechPlatform: $elevenlabs, 11 ); 1 // 2. audio in → audio out. tools, memory, prompt: untouched. 2 $result = $voice->call(new MessageBag( 3 Message::ofUser(Audio::fromFile('/tmp/answer.wav')), 4 )); 1 // 3. bytes in the body, the LLM's text rides in metadata. 2 $audio = $result->getContent(); // mp3 bytes 3 $text = $result->getMetadata()->get('text'); // "Your balance is …" 1 Decorator. It wraps any AgentInterface. Same call(), same contract. 2 Audio is content. Voice rides inside UserMessage as Audio. STT only runs when the latest user message has audio. 3 Both legs are optional. STT-only, TTS-only, or both — decided by which platforms + models you hand to SpeechConfiguration. 16 / 24

17 · Picking the right provider capability OpenAI ElevenLabs Cartesia
speech-to-text whisper scribe ink-2 / ink-whisper text-to-speech gpt-4o-mini-tts eleven_multilingual_v2 sonic-2 voice catalogue pre-defined voice A lot + cloning Similar to ElevenLabs latency (TTS) Acceptable Acceptable Slow depending on payload Best for Local STT STT and TTS, expressive models STT ALSO SUPPORTED Gemini STT · TTS OpenRouter STT · TTS Cohere STT Hugging Face STT Azure STT 17 / 24

17 · Picking the right provider capability OpenAI ElevenLabs Cartesia
speech-to-text whisper scribe ink-2 / ink-whisper text-to-speech gpt-4o-mini-tts eleven_multilingual_v2 sonic-2 voice catalogue pre-defined voice A lot + cloning Similar to ElevenLabs latency (TTS) Acceptable Acceptable Slow depending on payload Best for Local STT STT and TTS, expressive models STT ALSO SUPPORTED Gemini STT · TTS OpenRouter STT · TTS Cohere STT Hugging Face STT Azure STT SUGGESTION OpenAI for STT · ElevenLabs for TTS 17 / 24

Wait, what if something breaks? 18 / 24

19 · 418, I'm a teapot Your exception traces are
about to be read aloud. ✕ WHAT YOU HAVE TODAY "Error 422: Unprocessable Entity. Validation failed on field dot customer underscore id dot required." ✓ WHAT CALLERS DESERVE "I couldn't find that account. Could you read me the order number again?" 19 / 24

20 · Things will break 01 STT mis-hears "charge my
card" → "change my card." tool guardrails save you. 02 Model stalls First token doesn't arrive in 2s. You need a timeout and a filler. 03 TTS 429s Quota blown mid-sentence. Fall back to a second voice provider. 04 Audio socket drops Mobile networks do this. Reconnect and regenerate the file. 05 Tool throws The agent will happily hallucinate a refund confirmation. Don't let it. 06 User interrupts TTS is mid-sentence, they're already talking. You need barge-in. 20 / 24

21 · Handling the unknown 1 # config/packages/ai.yaml 2 ai:
3 platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 failover: 9 openai_elevenlabs: 10 platforms: 11 - 'ai.platform.openai' 12 - 'ai.platform.elevenlabs' 13 rate_limiter: 'limiter.failover_platform' 14 # ... 15 16 agent: 17 support: 18 platform: 'ai.platform.failover.openai_elevenlabs' 19 model: 'gpt-4o' 20 speech: 21 speech_to_text_platform: 'ai.platform.failover.openai_elevenlabs' 22 stt_model: 'whisper' 23 text_to_speech_platform: 'ai.platform.failover.openai_elevenlabs' 24 tts_model: 'eleven_multilingual_v2' 25 tts_options: 26 voice: 'Dslrhjl3ZpzrctukrQSN' Expecting errors? Failover built-in. 21 / 24

3 platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 failover: 9 openai_elevenlabs: 10 platforms: 11 - 'ai.platform.openai' 12 - 'ai.platform.elevenlabs' 13 rate_limiter: 'limiter.failover_platform' 14 # ... 15 16 agent: 17 support: 18 platform: 'ai.platform.failover.openai_elevenlabs' 19 model: 'gpt-4o' 20 speech: 21 speech_to_text_platform: 'ai.platform.failover.openai_elevenlabs' 22 stt_model: 'whisper' 23 text_to_speech_platform: 'ai.platform.failover.openai_elevenlabs' 24 tts_model: 'eleven_multilingual_v2' 25 tts_options: 26 voice: 'Dslrhjl3ZpzrctukrQSN' The failover platform. Wraps openai and elevenlabs. If one rate-limits or errors, the bundle falls back transparently. Your agent never knows. 21 / 24

3 platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 failover: 9 openai_elevenlabs: 10 platforms: 11 - 'ai.platform.openai' 12 - 'ai.platform.elevenlabs' 13 rate_limiter: 'limiter.failover_platform' 14 # ... 15 16 agent: 17 support: 18 platform: 'ai.platform.failover.openai_elevenlabs' 19 model: 'gpt-4o' 20 speech: 21 speech_to_text_platform: 'ai.platform.failover.openai_elevenlabs' 22 stt_model: 'whisper' 23 text_to_speech_platform: 'ai.platform.failover.openai_elevenlabs' 24 tts_model: 'eleven_multilingual_v2' 25 tts_options: 26 voice: 'Dslrhjl3ZpzrctukrQSN' One rate_limiter, your policy A Symfony RateLimiter reference. Once the limit hits, calls shifts to the next platform in the list. 21 / 24

3 platform: 4 openai: 5 api_key: '%env(OPENAI_API_KEY)%' 6 elevenlabs: 7 api_key: '%env(ELEVEN_LABS_API_KEY)%' 8 failover: 9 openai_elevenlabs: 10 platforms: 11 - 'ai.platform.openai' 12 - 'ai.platform.elevenlabs' 13 rate_limiter: 'limiter.failover_platform' 14 # ... 15 16 agent: 17 support: 18 platform: 'ai.platform.failover.openai_elevenlabs' 19 model: 'gpt-4o' 20 speech: 21 speech_to_text_platform: 'ai.platform.failover.openai_elevenlabs' 22 stt_model: 'whisper' 23 text_to_speech_platform: 'ai.platform.failover.openai_elevenlabs' 24 tts_model: 'eleven_multilingual_v2' 25 tts_options: 26 voice: 'Dslrhjl3ZpzrctukrQSN' Your agent, wired through. Point the agent at the failover platform instead of a single provider. The rest of the config is unchanged. 21 / 24

22 · Before you ship // THE PRODUCTION CHECKLIST 01
What about observability? Nothing changes. The Symfony Profiler, the DataCollector, Monolog — every call is traced. 02 What about a provider going down? Declare a failover platform with two or more providers. The platform picks the live one, your agent never knows. 03 What about cache & cost? Decorate with the Cache platform — same prompt twice is free. Stack Failover on top, you're future-ready. 04 What about a slow first token? Network dependent, consider using the Cache platform or streaming the output. 05 What about a tool throwing? Every tool returns a structured, speakable error. The agent reads "I couldn't reach that account", never a stack trace. 22 / 24

23 · Starting on the right foot Define the agent
Pick a model. Wire the tools and the prompt. The shape you already know. 01 Add ears and a mouth Plug an STT and a TTS provider through the SpeechAgent. Audio in, audio out. 02 Monitor and adapt Watch first-audible latency. Listen to real users. Tune voice, prompts, providers. 03 23 / 24

Thanks. → Questions? SOURCE CODE github.com/symfony/ai

Giving voice to your agents, the Symfony AI way

Giving voice to your agents, the Symfony AI way

More Decks by Loulier Guillaume

Other Decks in Programming

Featured

Transcript