Giving Eyes to Your AI: Building Vision-Enabled Agents with Haystack

Giving Eyes to Your AI Building Vision-Enabled Agents with Haystack
Bilge Yücel, Developer Relations Engineer @ deepset

Bilge Yücel Hello 👋 • Developer Relations Engineer at deepset
• B.Sc. Computer Science • M.Sc. Artificial Intelligence • Learning & teaching how to build with AI in/bilge-yucel @bilgeyucl

Who is deepset? Company Solving Custom AI challenges since 2018.
HQ in Berlin and NYC. Backed by: Leading open source framework & commercial platforms for custom enterprise-grade AI Products Used by 70 Thought leaders

Agenda 01 Core Concepts: Agents & Workflows & VLMs 02
Agents in Haystack 03 Giving Eyes to Our AI Colab & Demo 04 Q & A

Where are you on your agent journey?

Agents & Workflows & VLMs

What is an AI agent? An AI agent is an
LLM-based system that autonomously pursues a goal by interacting with its environment using tools. Human LLM Call Environment Action Feedback Stop

Agents vs Workflows • Autonomous system • You have complex,
multi-step problems requiring diverse actions • Tasks involve multiple tools / sources • Goals are clear but the optimal path to achieve them isn't predetermined

Agents vs Workflows • Autonomous system • You have complex,
multi-step problems requiring diverse actions • Tasks involve multiple tools / sources • Goals are clear but the optimal path to achieve them isn't predetermined • Defined flow • When interactions can follow predictable patterns (e.g. Q&A • Tasks can be decomposed into clear steps • Stability, robustness and efficiency are prioritized over automation potential

What is a Vision Language Model? Text → Text: Large
Language Models LLMs Text + Image → Text: Vision Language Models VLMs They are a type of generative models that take image and text inputs, and generate text outputs. Suitable for Visual Q&A. Hugging Face article: https://huggingface.co/blog/vlms • How to choose the right Vision Language Model • Technical Details • Fine-tuning Vision Language Models How is it different from an LLM?

Agents in Haystack

• Open-source AI orchestration framework by deepset • Backbone of
the deepset AI Platform • Provides the tools that Python developers need to build real world, agentic AI systems with visibility, control and modularity • Building blocks: Components & Pipelines

Haystack Agents User Request Agent LM (e.g. OpenAI, Anthropic, Google,
Open Models) System Prompt Python Functions External APIs Haystack Components MCP Servers Generated Answer

Giving Eyes to Your AI Image + PDF

Multimodal Agent • Can process images + PDFs • Access
to tool: get weather • 🏗 Orchestration: Haystack • 🧠 Model Provider: OpenAI • 💻 Platform: Google Colab

Hayhooks • Tool to deploy and serve Haystack Pipelines and
Agents as REST APIs and MCP Servers • Wrap Pipelines and Agents with custom logic and expose them via HTTP endpoints • OpenAI-compatible chat completion endpoints→ Open WebUI

Multimodal Agent • Can process images + PDFs • Access
to tool: get weather • 🏗 Orchestration: Haystack • 🧠 Model Provider: OpenAI • 🔌 REST API Hayhooks • 🖼 UI Open WebUI

Next Steps • Provide more tools ◦ Connect to MCP
Servers ◦ Use a multimodal RAG pipeline as a Tool ◦ Prebuilt Github tools • Try open models through Ollama, vLLM, Hugging Face • Think about observability, guardrails

Building AI Agents with Haystack Build a Tool-Calling Agent AI
Guardrails: Content Moderation and Safety with Open Language Models Creating a Multi-Agent System with Haystack Trace and Evaluate RAG with Arize Phoenix DevOps Support Agent with Human in the Loop Build a GitHub Issue Resolver Agent Build a GitHub PR Creator Agent Building AI Agents with Haystack

Thank you www.deepset.ai www.haystack.deepset.ai in/bilge-yucel @bilgeyucl

Giving Eyes to Your AI: Building Vision-Enabled...

Giving Eyes to Your AI: Building Vision-Enabled Agents with Haystack

Bilge Yücel

More Decks by Bilge Yücel

Other Decks in Programming

Featured

Transcript