Slide 1

Slide 1 text

Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Linus Beckhaus @linusbeckhaus Gen AI Developer

Slide 2

Slide 2 text

- Low-latency - Audio native multimodal interaction - Specialized Models - Full feature support - Websocket or WebRTC Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Realtime STT TTS Text Model Speech-to-Speech Model Traditional Voice Pipeline Native Multimodal (Realtime)

Slide 3

Slide 3 text

Usecases - "Actual" interactive chatbots - Hands-free application - Digital Assistant - Robotics - Audio native systems (e.g., call center) Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Realtime

Slide 4

Slide 4 text

Conversation Session Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Realtime API DEMO

Slide 5

Slide 5 text

Conversation lifecycle OpenAI Realtime API Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Client Server Sending of response (including transcript) is finished ••• User Input response.done Events related to user input (speech, text, data)

Slide 6

Slide 6 text

Conversation lifecycle OpenAI Realtime API Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Client Server ••• User Input response.done Events happening in the background conversation.item.added conversation.item.done response.created response.output_item.added conversation.item.added response.content_part.added output_audio_buffer.started response.output_audio.done response.content_part.done conversation.item.done response.output_item.done (additional out of order transcription events)

Slide 7

Slide 7 text

Input Events (WebRTC) OpenAI Realtime API Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Server has detected speech in audio buffer Server has detected end of speech input_audio_buffer.speech_started input_audio_buffer.speech_stopped input_audio_buffer.committed Client Server Input audio buffer has been committed

Slide 8

Slide 8 text

Input Events (Text/Data) OpenAI Realtime API Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Add data into the Conversation context (messages, function call, function call response, full audio) Trigger response/processing conversation.item.create response.create Client Server

Slide 9

Slide 9 text

Voice Agents - Typescript client for OpenAI Realtime API - Abstraction of transport layer and session - Full integration into agentic framework Python version support for Realtime in beta Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme OpenAI Agents SDK https://openai.github.io/openai-agents-js/guides/voice-agents/

Slide 10

Slide 10 text

RealtimeAgent RealtimeSession Key Components Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Voice Agents RealtimeSession RealtimeAgent Interface to session Transport Context History Configuration Agent Tools Instructions Settings Guardrails Handoffs RealtimeAgent RealtimeAgent Interface to agent

Slide 11

Slide 11 text

Basic Setup Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Voice Agents 1. Create Realtime Agent 2. Create RealtimeSession and set initial agent 3. Establish connection 4. Start server npm run dev and get talking Use ephemeral key - Time limited - Safe - Configurable

Slide 12

Slide 12 text

Agents SDK - Combination of Session and Agent - Initial session configuration - Agents have their own configuration - Update used Agent of RealtimeSession Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Configuration session.updateAgent session.updated

Slide 13

Slide 13 text

Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling Tool Call 1 2 5 6 3 4 Code, Web, Database Agents, API, … Extend LLM capabilities - Ground with code - Perform deterministic actions - Interface with other Systems

Slide 14

Slide 14 text

Custom tools - Executable code with metadata - tool wrapper - Structured output - Zod integration Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling DEMO

Slide 15

Slide 15 text

Using Agents as Tools - Use an (sub-)agent in a Tool - Blocking as all Tool Calls - Agent can run server side - Not limited to RealtimeAgents Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling https://openai.github.io/openai-agents-js/guides/voice-agents/build/#delegation-through-tools

Slide 16

Slide 16 text

Configuration - Tools set on RealtimeAgent - Create new Agent - Update Agent Config - tools attribute - Tool use behavior set on RealtimeSession - config.toolChoice attribute - Updates through underlying transport session.transport.updateSessionConfig(…) Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling session.updateAgent session.updated

Slide 17

Slide 17 text

Human-in-the-Loop - Sensitive actions - Pause execution until approval/rejection - needsApproval option - boolean - async function using context - Rejection handeled automatically - Triggers response by default Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling Requires Approval 1 Tool Call 2 3 4 3 4 5 6 7 6 DEMO

Slide 18

Slide 18 text

Handling Rejection Preventing response on rejection - Lower level sendFunctionCallOutput - Manually send result Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Tool Calling

Slide 19

Slide 19 text

Many more features - Handoffs - Guardrails - MCP - Tracing - Turn detection Realtime API und Voice Agents Die nächste Generation interaktiver KI-Systeme Agents SDK

Slide 20

Slide 20 text

Thank you for your attention Linus Beckhaus @linusbeckhaus Gen AI Developer