Upgrade to Pro — share decks privately, control downloads, hide ads and more …

“Hello, AI?!” — Real-time Interactions with Lan...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Christian Liebel Christian Liebel PRO
June 25, 2026
15

“Hello, AI?!” — Real-time Interactions with Language Models, Local and Offline-Capable

Large Language Models (LLMs) have changed how we design software: Instead of clicks and GUIs, natural language now dominates— not just via keyboard and text, but also by voice. In this session, you will learn how to integrate voice-enabled AI models directly into your application and control them in real-time using your voice. Thanks to a combination of local AI models, you can also implement this locally, on your own device. In practical demos, Christian Liebel from Thinktecture will show you how to address selected LLMs by voice, link your functionalities to them, and develop smart, conversational interfaces.

Avatar for Christian Liebel

Christian Liebel PRO

June 25, 2026

More Decks by Christian Liebel

Transcript

  1. Hello, it’s me. “Hello, AI!?“ Real-time interactions with language models,

    local and offline-capable Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel
  2. Overview “Hello, AI!?“ Real-time interactions with language models, local and

    offline-capable Generative AI Text OpenAI GPT Claude Opus … Audio/Music Musico Soundraw … Images Nano ! Firefly … Video Omni Sora … Speech Whisper tortoise-tts …
  3. Overview “Hello, AI!?“ Real-time interactions with language models, local and

    offline-capable Generative AI Text OpenAI GPT Claude Opus … Audio/Music Musico Soundraw … Images Nano ! Firefly … Video Omni Sora … Speech Whisper tortoise-tts …
  4. – Process speech input and output natively (transcription optional) –

    Multiple languages and output voices are supported – Models can detect emotions – Can consume additional input (text, images, video) – Tool/function calling are supported – Voice Activity Detection (VAD) activated automatically (model waits for a period of silence before responding) – Models can be interrupted “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models
  5. Use Cases – Natural language interfaces – Smart form filling

    – Navigation – Voice assistants – Phone agents – Alternative input methods for accessibility (e.g., ticket machines) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models
  6. OpenAI Realtime API Gemini Live API – Gemini 3.1 Flash

    Live Preview “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Realtime Models https://developers.openai.com/api/docs/models
  7. OpenAI Realtime API – 57+ languages – Supports speech, text

    and image input – Supports speech and text output – Supports WebRTC, WebSockets and SIP – Agents SDK in TS, WebRTC integration is ~50 LOC Gemini Live API – 70+ languages – Supports speech, text and video input – Supports speech and text output – Supports WebSockets – Google Gen AI SDK “Hello, AI!?“ Real-time interactions with language models, local and offline-capable APIs
  8. Web Real-Time Communication – JavaScript API for real-time audio/video communication

    – Supports data channels for data transfer – Used by Google Meet, Microsoft Teams (web), … – W3C Recommendation (web standard) – Supported by all major browsers for several years (Chrome 27, Edge 15, Safari 11, Firefox 22) https://webrtc.org/ “Hello, AI!?“ Real-time interactions with language models, local and offline-capable WebRTC
  9. getUserMedia() – JavaScript APIs for accessing media devices – Captures

    video and/or audio input – W3C Candidate Recommendation – Supported by all major browsers for several years (Chrome 21, Edge 12, Safari 11, Firefox 17) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Media Capture & Streams API
  10. Let’s change the world Tools/Function calling can be used to...

    – extend the model’s knowledge by accessing custom data (customer data, articles, orders, wikis, postcode API, …) – extend the model’s capabilities by executing real-world actions (navigate, send an SMS, update order status in a database, fill in a form, perform a web search, …) “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Tool/Function Calling
  11. Foundation of Agentic AI “Hello, AI!?“ Real-time interactions with language

    models, local and offline-capable Tool/Function Calling ReAct & Tools & MCP https://mcp.so
  12. “Hello, AI!?“ Real-time interactions with language models, local and offline-capable

    OpenAI Realtime API https://developers.openai.com/api/docs/pricing (25.06.2026)
  13. “Roughly $1.15 [input] / $4.61 [output] per hour of conversation

    at default settings” “Hello, AI!?“ Real-time interactions with language models, local and offline-capable OpenAI Realtime API https://handyai.substack.com/p/model-drop-gpt-realtime-2
  14. Drawbacks “Hello, AI!?“ Generative AI Cloud Providers Require a (stable)

    internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription Real-time interactions with language models, local and offline-capable
  15. “Hello, AI!?“ Real-time interactions with language models, local and offline-capable

    WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113, Safari 26, and Firefox 141 on Windows/Firefox 145 on macOS 26 Tahoe on Apple Silicon
  16. “Hello, AI!?“ Real-time interactions with language models, local and offline-capable

    Local Realtime APIs Silero (VAD) Whisper (STT) SmolLM2- 1.7B (LLM) Kokoro (TTS)
  17. LFM2-Audio-1.5B – End-to-end audio foundation model – English (+ Japanese

    model) – Does not support tool calling out of the box – Supports fine-tuning – Use cases: conversational chat, meeting transcription, live translation, … “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Local Realtime APIs https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
  18. “Hello, AI!?“ Real-time interactions with language models, local and offline-capable

    Local Realtime APIs https://github.com/Liquid4All/liquid-audio
  19. – Realtime models unlock new, exciting opportunities for natural language

    interfaces beyond chat boxes – Bidirectional, multilingual, minimum latency – Quality is good, but not perfect – Local options have also arrived, but are still evolving – Fun! – No science fiction, try it today! “Hello, AI!?“ Real-time interactions with language models, local and offline-capable Summary