Slide 1

Slide 1 text

LLMs & Embeddings in Action: Real-World Patterns für Ihre AI-Anwendungen Christian Weyer | Co-Founder & CTO | Thinktecture AG | [email protected]

Slide 2

Slide 2 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Our journey Models for our software Lightweight RAG Semantic Routing Observability Structured Output / Tool Calling 2

Slide 3

Slide 3 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen MODELS FOR OUR SOFTWARE 3

Slide 4

Slide 4 text

Language Models understand and generate semantically rich human language, transforming it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen 🫱 🫲 Semantic AI Generative AI 4

Slide 5

Slide 5 text

§ Language & embedding models part of end-to-end architectures § E-M enable semantic search & comparison § L-M enable human language understanding via context § System prompt § Conversation history § User query LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen API-based model integrations 5

Slide 6

Slide 6 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Classical applications & UIs API-based data Document-based data 6

Slide 7

Slide 7 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Language-enabled “UIs” – Talk-to-TT 7

Slide 8

Slide 8 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen C4 system context diagram § Various tech stacks § Docker-based distributed system 8

Slide 9

Slide 9 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen PATTERN LIGHTWEIGHT RAG 9

Slide 10

Slide 10 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Talking to documents (Retrieval-augmented generation) Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 10

Slide 11

Slide 11 text

§ Frameworks § LangChain § FastEmbed § Lightweight & efficient for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) – 768 dims § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Lightweight RAG 11

Slide 12

Slide 12 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen PATTERN STRUCTURED OUTPUT 12

Slide 13

Slide 13 text

§ Tools integration is being standardized with MCP LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Talking to APIs (Function / Tool calling) 13 “When is CW available for a two-days workshop?” System Prompt (+ employee data) + Schema (for structured output) Web API Availability business logic

Slide 14

Slide 14 text

§ Frameworks § Pydantic § Instructor § Methodology § Schema with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Structured Output 14

Slide 15

Slide 15 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen PATTERN SEMANTIC ROUTING 15

Slide 16

Slide 16 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Semantics-based decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned NLP Model Embedding Model 16

Slide 17

Slide 17 text

Guarding § Frameworks § llm-guard § HuggingFace Transformers § NLP model § deepset/ deberta-v3-base-injection (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) – 1024 dims § Vector store § PostgreSql (pgvector) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Semantic Guarding & Routing 17

Slide 18

Slide 18 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen PATTERN / SOLUTION OBSERVABILITY 18

Slide 19

Slide 19 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Things can get… overwhelming 19

Slide 20

Slide 20 text

§ Methodology § Open Telemetry (OTel) § Frameworks § OTel Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation - Observability 20

Slide 21

Slide 21 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Typical Semantic AI patterns & solutions – in end-to-end software engineering Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 21

Slide 22

Slide 22 text

LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen 22 AI solutions are ≅10% AI and 100% software engineering. 22

Slide 23

Slide 23 text

Thank you! Christian Weyer [email protected] https://thinktecture.com/christian-weyer