Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BASTA! Spring 2025: Semantic AI in Action: Arch...

BASTA! Spring 2025: Semantic AI in Action: Architektur-Patterns für LLMs & Embeddings

Semantic AI als Schlüssel zur Integration von KI in eigene Lösungen. In diesem Vortrag zeigt Christian Weyer praxisnahe Architektur-Patterns und Ansätze für die Nutzung von Large und Small Language Models wie GPT oder Llama sowie Embedding-Modellen in modernen Software-Architekturen. Wichtige Konzepte wie Semantic Routing, Semantic Search & RAG, Structured Output und Observability werden anhand eines End-to-End-Systems mit mehreren Services und Client-Anwendungen demonstriert. Entwickler und Architekten erhalten einen pragmatischen Überblick über die mögliche Umsetzung in eigenen Projekten.

Christian Weyer

March 04, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings 3 🫱 🫲 Semantic AI Generative AI
  2. § Language models (LLM, SLM) & embedding models (EM) in

    end-to-end architectures § EM enable semantic search & comparison § LM enable human language understanding via API calls § System prompt § User query Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Model integrations 5
  3. § We rely on Language Model’s Language understanding NOT on

    its world knowledge❗ Semantic AI in Action Architektur-Patterns für LLMs & Embeddings ⚠ Important shoutout 6
  4. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings 9

    Sample solution - C4 system context diagram
  5. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Sample

    solution - Technology stack 10 Services § Python as the go-to-platform for ML/AI/Gen-AI § Esp. for local model execution § But: Most of the logic could be implemented in any language/platform Clients
  6. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings “Talk

    to your data” Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er LLM 12 Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB
  7. § Frameworks § LangChain § Fastembed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Local RAG 13
  8. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Structured

    data from unstructured input – e.g. for API calling 15 “OK, when is my colleague CW available for a two- days workshop?” System Prompt (with employee data) + Schema / Function Calling (for structured output) Web API Availability business logic
  9. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Structured Output 16
  10. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Semantics-based

    decisions for user interactions 18 Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Semantic Engine (Fine-tuned Language Model, Embedding Model) Target RAG 1 Target Structured Output & API Call Target … something else …
  11. Guarding § Frameworks § llm-guard § HuggingFace Transformers § Model

    § deepset/deberta-v3-base- injection Routing § Frameworks § semantic-routing § Fastembed § Embedding model § BAAI/bge-small-en-v1.5 - local § Vector store § PostgreSql (pgvector) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Semantic Guarding & Routing 19
  12. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation - Observability 23
  13. Semantic routing Semantic AI in Action Architektur-Patterns für LLMs &

    Embeddings "Talk to your systems"(for Availability info) 25 Web App / Watch App Speech-to-Text Internal Gateway (Python FastAPI) LLM / SLM Text-to-Speech Transcribe spoken text Transcribed text Check for experts availability with text Extract { experts, booking times } from text Structured JSON data (Function calling) Generate response with availability Response Response with experts availability 🔉 Speech-to-text for response Response audio Internal Business API (node.js – veeeery old) Query Availability API Availability When is CL…? CL will be…
  14. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Recap:

    Top Semantic AI patterns & solutions – in end-to-end software engineering 26 Local RAG Structured Output Semantic Guarding & Routing Insightful Observability 💡 Fun Fact: Large parts been built with AI-assisted Coding
  15. § Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures

    § Microsoft Regional Director § Microsoft MVP for AI § Google GDE for Web AI [email protected] https://www.thinktecture.com Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Christian Weyer Co-Founder & CTO @ Thinktecture AG