Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modular RAG Architectures with Java and Spring AI

Modular RAG Architectures with Java and Spring AI

Retrieval-Augmented Generation (RAG) is a foundational approach to enhancing LLMs by integrating them with targeted, domain-specific data. This session will provide a hands-on guide to designing and implementing modular RAG architectures using Java and Spring AI. Through practical examples, you will gain insights into building production-ready RAG workflows that support both current and future use cases.

The session will cover five key stages:
- Data Indexing: Methods for structuring, chunking, and indexing documents, improving the efficiency and accuracy of downstream retrieval tasks.
- Query Analysis: Techniques for transforming and enhancing queries in the pre-retrieval phase to improve relevance and precision during retrieval.
- Retrieval: Strategies for sourcing data from multiple repositories, using agent-based routing to select relevant data sources, and combining, re-ranking, or filtering results to deliver the most contextually relevant information for generation.
- Augmentation: Methods for augmenting user prompts with retrieved content, including advanced techniques for contextualization, citation, and summarization.
- Evaluation: Metrics and evaluation methods to assess document relevance and model output accuracy, as well as iterative feedback loops for refining RAG workflows to meet quality standards.
Additionally, the session will address important considerations for observability and developer experience in building RAG workflows.

Avatar for Thomas Vitale

Thomas Vitale

May 23, 2025
Tweet

More Decks by Thomas Vitale

Other Decks in Technology

Transcript

  1. Application Inference Service Consum es LLM s Architecture Database Reads/writes

    data Observability Platform Exports telem etry Spring Boot Application Vaadin Spring AI Arconia @thomasvitale.com
  2. Arconia Dev Services and OpenTelemetry arconia dev gradle bootRun mvn

    spring-boot:run @thomasvitale.com arconia.io
  3. Prompt Stuffing Inference Service Request Response Augmenting Prompts with Context

    Answer Application @thomasvitale.com Question Context
  4. Retrieval Augmented Generation Inference Service Request Response Question Answer Application

    Augment with Context Web Search Engines Search Engine HTTP API @thomasvitale.com Context
  5. Retrieval Augmented Generation Inference Service Request Response Question Answer Application

    Augment with Context Vector Stores Vector Store Semantic Search @thomasvitale.com
  6. Conditional RAG @thomasvitale.com Query Routing Generation Response Retrieval routes to

    Generation Response Retrieval routes to Generation Response Retrieval routes to Router Query
  7. Retrieval Augmented Generation Inference Service Request Response Question Answer Application

    Augment with Context Prompt Augmentation with Retrieved Context Source Query @thomasvitale.com
  8. Tools Inference Service Request Tool Calling Question Response Answer Application

    API Tool Call Tool Execution Tool Call Request Tool Call Response @thomasvitale.com