Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BASTA! 2025: Semantic AI in Action: Architektur...

BASTA! 2025: Semantic AI in Action: Architektur-Patterns für Language- und Embedding-Modelle

Semantic AI als Weiterentwicklung von Generative AI kann der Schlüssel zur Integration von KI in eigene Lösungen sein. In diesem Vortrag zeigt Christian Weyer praxisnahe Architektur-Patterns und Ansätze für die Nutzung von Large und Small Language Models wie GPT oder Llama sowie Embedding-Modellen in modernen Softwarearchitekturen. Wichtige Konzepte wie Semantic Routing, Semantic Search und local RAG, Structured Output und Observability werden anhand eines End-to-End-Systems mit mehreren Services und Clientanwendungen demonstriert. Entwickler und Architekten erhalten einen pragmatischen Überblick über die mögliche Umsetzung in eigenen Projekten.

Avatar for Christian Weyer

Christian Weyer PRO

September 24, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Our

    journey Models for our software Lightweight RAG Semantic Routing Observability Structured Output / Tool Calling 2
  2. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle 🫱 🫲 Semantic AI Generative AI 4
  3. § Language & embedding models part of end-to-end architectures §

    Embedding models can be run locally § Optimized for CPU § Language models still hard to run locally § High GPU power § High VRAM § High memory bandwidth Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle API-based AI model integrations 5
  4. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle C4

    system context diagram § Various tech stacks § Docker-based distributed system 8
  5. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Talking

    to documents (Retrieval-augmented generation) Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 10
  6. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) – 768 dims § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Technical implementation – Lightweight RAG 11
  7. § Tools integration (and more) is being standardized with MCP

    Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Talking to APIs (Function / Tool calling) 13 “When is CW available for a two-days workshop?” System Prompt (+ employee data) + Schema (for structured output) Web API Availability business logic
  8. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Technical implementation – Structured Output 14
  9. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Semantics-based

    decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned NLP Model Embedding Model 16
  10. Guarding § Frameworks § llm-guard § HuggingFace Transformers § NLP

    model § deepset/ deberta-v3-base-injection (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) – 1024 dims § Vector store § PostgreSql (pgvector) Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Technical implementation – Semantic Guarding & Routing 17
  11. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Technical implementation - Observability 20
  12. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle Typical

    Semantic AI patterns & solutions – in end-to-end software engineering Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 21
  13. Semantic AI in Action Architektur-Patterns für Language- und Embedding-Modelle 22

    AI solutions are ≅10% AI and 100% software engineering. 22