JAX 2025: Semantic AI in Action: Architektur-Patterns für LLMs & Embeddings

Slide 1

Slide 1 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Christian Weyer | Co-Founder & CTO | Thinktecture AG | [email protected]

Slide 2

Slide 2 text

§ Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures § Microsoft Regional Director § Microsoft MVP for AI § Google GDE for Web AI [email protected] https://www.thinktecture.com Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Christian Weyer Co-Founder & CTO @ Thinktecture AG 2

Slide 3

Slide 3 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Our journey 3 Models for our software Lightweight RAG Semantic Routing Observability LLM all-the-things? Structured Output / Tool Calling

Slide 4

Slide 4 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings LLM ALL-THE-THINGS? 4

Slide 5

Slide 5 text

Language Models understand and generate semantically rich human language, transforming it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings 5 🫱 🫲 Semantic AI Generative AI

Slide 6

Slide 6 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings MODELS FOR OUR SOFTWARE 6

Slide 7

Slide 7 text

§ Language & embedding models part of end-to-end architectures § E-M enable semantic search & comparison § L-M enable human language understanding via context § System prompt § Conversation history § User query Semantic AI in Action Architektur-Patterns für LLMs & Embeddings API-based model integrations 7

Slide 8

Slide 8 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Classical applications & UIs 8

Slide 9

Slide 9 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Language-enabled “UIs” – Talk-to-TT sample 9

Slide 10

Slide 10 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings 10 C4 system context diagram

Slide 11

Slide 11 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings PATTERN LIGHTWEIGHT RAG [RETRIEVAL-AUGMENTED GENERATION] 11

Slide 12

Slide 12 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings “Talk to your Data” Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 12

Slide 13

Slide 13 text

§ Frameworks § LangChain § FastEmbed § Lightweight & efﬁcient for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Lightweight RAG 13

Slide 14

Slide 14 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings PATTERN STRUCTURED OUTPUT 14

Slide 15

Slide 15 text

§ Integration is being standardized with MCP Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Structured data from unstructured input For calling APIs / tools 15 “OK, when is my colleague CW available for a two- days workshop?” System Prompt (with employee data) + Schema / Function Calling (for structured output) (Internal) Web API Availability business logic

Slide 16

Slide 16 text

§ Frameworks § Pydantic § Instructor § Methodology § Schema with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Structured Output 16

Slide 17

Slide 17 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings PATTERN SEMANTIC GUARDING & ROUTING 17

Slide 18

Slide 18 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Semantics-based decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned Language Model Embedding Model 18

Slide 19

Slide 19 text

Guarding § Frameworks § llm-guard § HuggingFace Transformers § Model § deepset/deberta-v3-base- injection Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intﬂoat/multilingual-e5- large § Vector store § PostgreSql (pgvector) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Semantic Guarding & Routing 19

Slide 20

Slide 20 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings PATTERN / SOLUTION OBSERVABILITY 20

Slide 21

Slide 21 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Things can get… overwhelming 21

Slide 22

Slide 22 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings End-to-end tracing 22

Slide 23

Slide 23 text

§ Methodology § Open Telemetry (OTel) § Frameworks § OTel Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation - Observability 23

Slide 24

Slide 24 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings END-TO-END SOLUTION ILLUSTRATED 24

Slide 25

Slide 25 text

Semantic routing Semantic AI in Action Architektur-Patterns für LLMs & Embeddings "Talk to your systems" - for Availability info 25 Web App / Watch App Speech-to-Text Internal Gateway (Python FastAPI) LLM / SLM Text-to-Speech Transcribe spoken text Transcribed text Check for experts availability with text Extract { experts, booking times } from text Structured JSON data (Function calling) Generate response with availability Response Response with experts availability 🔉 Speech-to-text for response Response audio Internal Business API (node.js – veeeery old) Query Availability API Availability When is CL…? CL will be…

Slide 26

Slide 26 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Recap: Top Semantic AI patterns & solutions – in end-to-end software engineering 26 Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 💡 Fun Fact: Large parts been built with AI-assisted Coding / Vibe Coding

Slide 27

Slide 27 text

Thank you! Christian Weyer https://thinktecture.com/christian-weyer [email protected] 27

Slide 28

Slide 28 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings APPENDIX 28

Slide 29

Slide 29 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings GEN AI EVERYWHERE 29

Slide 30

Slide 30 text

Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Language & Embedding Models everywhere OpenAI-related (cloud) OpenAI Azure OpenAI Service Big cloud providers Google Model Garden on Vertex AI Amazon Bedrock Open-source Edge IoT Server Desktop Mobile Web Other providers Anthropic Google DeepMind Mistral AI Hugging Face Open-source 30

Slide 31

Slide 31 text

§ SLM families, e.g. § Llama § Mistral § Phi § Qwen § Success factors § Use case § Parameter size § Quantization § Local inference runtimes with APIs § E.g. llama.cpp, ollama, VLLM, ONNXRuntime Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Open-source models thrive 31 § Local UIs § E.g. Open WebUI § Processing power needed § CPU optimization on its way § Embedding models often run great on CPU