Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Global AI Braunschweig: Semantic AI: A Language...

Global AI Braunschweig: Semantic AI: A Language Model & an Embedding Model walk into a bar...

Semantic AI, as an evolution of Generative AI, can be the key to integrating AI into your own solutions. In this talk, Christian Weyer presents practical architecture patterns and approaches for using large and small language models like GPT or LLaMA, as well as embedding models, in modern software architectures. Key concepts such as Semantic Routing, Semantic Search & lightweight RAG, as well as Structured Output are demonstrated using an end-to-end system with multiple services and client applications. Developers and architects will gain a pragmatic overview of how these can be implemented in their own projects.

Avatar for Christian Weyer

Christian Weyer

July 04, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. Semantic AI: A Language Model & an Embedding Model walk

    into a bar... Christian Weyer | Co-Founder & CTO | Thinktecture AG | [email protected]
  2. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Our journey Models for our software Lightweight RAG Semantic Routing Observability Structured Output / Tool Calling 2
  3. Semantic AI A Language Model & an Embedding Model walk

    into a bar... MODELS FOR OUR SOFTWARE 3
  4. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Semantic AI A Language Model & an Embedding Model walk into a bar... 🫱 🫲 Semantic AI Generative AI 4
  5. § Language & embedding models part of end-to-end architectures §

    Embedding models can be run locally § Optimized for CPU § Language models still hard to run locally § High GPU power § High VRAM § High memory bandwidth Semantic AI A Language Model & an Embedding Model walk into a bar... API-based AI model integrations 5
  6. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Classical applications & UIs API-based data Document-based data 6
  7. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Language-enabled “UIs” – Talk-to-TT 7
  8. Semantic AI A Language Model & an Embedding Model walk

    into a bar... C4 system context diagram § Various tech stacks § Docker-based distributed system 8
  9. Semantic AI A Language Model & an Embedding Model walk

    into a bar... PATTERN LIGHTWEIGHT RAG 9
  10. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Talking to documents (Retrieval-augmented generation) Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 10
  11. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) – 768 dims § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI A Language Model & an Embedding Model walk into a bar... Technical implementation – Lightweight RAG 11
  12. Semantic AI A Language Model & an Embedding Model walk

    into a bar... PATTERN STRUCTURED OUTPUT 12
  13. § Tools integration is being standardized with MCP Semantic AI

    A Language Model & an Embedding Model walk into a bar... Talking to APIs (Function / Tool calling) 13 “When is CW available for a two-days workshop?” System Prompt (+ employee data) + Schema (for structured output) Web API Availability business logic
  14. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI A Language Model & an Embedding Model walk into a bar... Technical implementation – Structured Output 14
  15. Semantic AI A Language Model & an Embedding Model walk

    into a bar... PATTERN SEMANTIC ROUTING 15
  16. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Semantics-based decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned NLP Model Embedding Model 16
  17. Guarding § Frameworks § llm-guard § HuggingFace Transformers § NLP

    model § deepset/ deberta-v3-base-injection (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) – 1024 dims § Vector store § PostgreSql (pgvector) Semantic AI A Language Model & an Embedding Model walk into a bar... Technical implementation – Semantic Guarding & Routing 17
  18. Semantic AI A Language Model & an Embedding Model walk

    into a bar... PATTERN / SOLUTION OBSERVABILITY 18
  19. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Things can get… overwhelming 19
  20. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system Semantic AI A Language Model & an Embedding Model walk into a bar... Technical implementation - Observability 20
  21. Semantic AI A Language Model & an Embedding Model walk

    into a bar... Typical Semantic AI patterns & solutions – in end-to-end software engineering Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 21
  22. Semantic AI A Language Model & an Embedding Model walk

    into a bar... 22 AI solutions are ≅10% AI and 100% software engineering. 22