“Hello, AI?!” — Real-time Interactions with Language Models, Local and Offline-Capable

“Hello, AI?!” Real-time interactions with language models, local and offline-capable
Christian Liebel @christianliebel Consultant

Hello, it’s me. “Hello, AI!?“ Real-time interactions with language models,
local and offline-capable Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel

Overview “Hello, AI!?“ Real-time interactions with language models, local and
offline-capable Generative AI Text OpenAI GPT Claude Opus … Audio/Music Musico Soundraw … Images Nano ! Firefly … Video Omni Sora … Speech Whisper tortoise-tts …

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable
Large Language Models

Multimodal Models

Multimodal Realtime Models DEMO

DEMO

– Process speech input and output natively (transcription optional) –
Multiple languages and output voices are supported – Models can detect emotions – Can consume additional input (text, images, video) – Tool/function calling are supported – Voice Activity Detection (VAD) activated automatically (model waits for a period of silence before responding) – Models can be interrupted “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models

Use Cases – Natural language interfaces – Smart form filling
– Navigation – Voice assistants – Phone agents – Alternative input methods for accessibility (e.g., ticket machines) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models

OpenAI Realtime API Gemini Live API – Gemini 3.1 Flash
Live Preview “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models https://developers.openai.com/api/docs/models

OpenAI Realtime API – 57+ languages – Supports speech, text
and image input – Supports speech and text output – Supports WebRTC, WebSockets and SIP – Agents SDK in TS, WebRTC integration is ~50 LOC Gemini Live API – 70+ languages – Supports speech, text and video input – Supports speech and text output – Supports WebSockets – Google Gen AI SDK “Hello, AI!?“ Real-time interactions with language models, local and offline-capable APIs

Web Real-Time Communication – JavaScript API for real-time audio/video communication
– Supports data channels for data transfer – Used by Google Meet, Microsoft Teams (web), … – W3C Recommendation (web standard) – Supported by all major browsers for several years (Chrome 27, Edge 15, Safari 11, Firefox 22) https://webrtc.org/ “Hello, AI!?“ Real-time interactions with language models, local and offline-capable WebRTC

getUserMedia() – JavaScript APIs for accessing media devices – Captures
video and/or audio input – W3C Candidate Recommendation – Supported by all major browsers for several years (Chrome 21, Edge 12, Safari 11, Firefox 17) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Media Capture & Streams API

Let’s change the world Tools/Function calling can be used to...
– extend the model’s knowledge by accessing custom data (customer data, articles, orders, wikis, postcode API, …) – extend the model’s capabilities by executing real-world actions (navigate, send an SMS, update order status in a database, fill in a form, perform a web search, …) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Tool/Function Calling

Foundation of Agentic AI “Hello, AI!?“ Real-time interactions with language
models, local and offline-capable Tool/Function Calling ReAct & Tools & MCP https://mcp.so

Tool/Function Calling DEMO

OpenAI Realtime API https://developers.openai.com/api/docs/pricing (25.06.2026)

“Roughly $1.15 [input] / $4.61 [output] per hour of conversation
at default settings” “Hello, AI!?“ Real-time interactions with language models, local and offline-capable OpenAI Realtime API https://handyai.substack.com/p/model-drop-gpt-realtime-2

Drawbacks “Hello, AI!?“ Generative AI Cloud Providers Require a (stable)
internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription Real-time interactions with language models, local and offline-capable

Local Realtime APIs DEMO

WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113, Safari 26, and Firefox 141 on Windows/Firefox 145 on macOS 26 Tahoe on Apple Silicon

Local Realtime APIs Silero (VAD) Whisper (STT) SmolLM2- 1.7B (LLM) Kokoro (TTS)

Local Realtime APIs End-to-end audio

LFM2-Audio-1.5B – End-to-end audio foundation model – English (+ Japanese
model) – Does not support tool calling out of the box – Supports fine-tuning – Use cases: conversational chat, meeting transcription, live translation, … “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Local Realtime APIs https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model

Local Realtime APIs https://github.com/Liquid4All/liquid-audio

Local Realtime APIs DEMO

– Realtime models unlock new, exciting opportunities for natural language
interfaces beyond chat boxes – Bidirectional, multilingual, minimum latency – Quality is good, but not perfect – Local options have also arrived, but are still evolving – Fun! – No science fiction, try it today! “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

“Hello, AI?!” — Real-time Interactions with Lan...

“Hello, AI?!” — Real-time Interactions with Language Models, Local and Offline-Capable

Christian Liebel PRO

More Decks by Christian Liebel

Featured

Transcript

“Hello, AI?!” Real-time interactions with language models, local and offline-capable

Hello, it’s me. “Hello, AI!?“ Real-time interactions with language models,

Overview “Hello, AI!?“ Real-time interactions with language models, local and

Overview “Hello, AI!?“ Real-time interactions with language models, local and

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

– Process speech input and output natively (transcription optional) –

Use Cases – Natural language interfaces – Smart form filling

OpenAI Realtime API Gemini Live API – Gemini 3.1 Flash

OpenAI Realtime API – 57+ languages – Supports speech, text

Web Real-Time Communication – JavaScript API for real-time audio/video communication

getUserMedia() – JavaScript APIs for accessing media devices – Captures

Let’s change the world Tools/Function calling can be used to...

Foundation of Agentic AI “Hello, AI!?“ Real-time interactions with language

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Roughly $1.15 [input] / $4.61 [output] per hour of conversation

Drawbacks “Hello, AI!?“ Generative AI Cloud Providers Require a (stable)

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

LFM2-Audio-1.5B – End-to-end audio foundation model – English (+ Japanese

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

“Hello, AI!?“ Real-time interactions with language models, local and offline-capable

– Realtime models unlock new, exciting opportunities for natural language

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]