Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond LLMs: Using Embedding Models for Input G...

Beyond LLMs: Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions

Retrieval Augmented Generation (RAG) leverages retrievers like vector databases to fetch relevant data for answering queries. In advanced RAG setups involving multiple data sources, selecting the best retriever is critical. Traditionally, in LangChain this is handled by a MultiRoute Chain, where a Large Language Model (LLM) dynamically chooses the optimal data source based on semantic fit. However, this approach can be slow, costly, and unpredictable.
Enter the Open Source library Semantic Router—a faster, cheaper, and deterministic alternative that uses an embedding model for retriever selection without compromising quality.
In this talk, I’ll showcase the Semantic Router’s broader capabilities, including input guarding for AI applications and efficient tool selection for function calling.
Through live coding, we’ll first build a traditional MultiRoute Chain and then optimize it with Semantic Router, illustrating how this transformation can dramatically improve efficiency in RAG workflows.

Avatar for Marco Frodl

Marco Frodl

June 25, 2025
Tweet

More Decks by Marco Frodl

Other Decks in Technology

Transcript

  1. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Marco Frodl [email protected] Principal Consultant for Generative AI @marcofrodl
  2. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions About Me Marco Frodl Principal Consultant for Generative AI Thinktecture AG X: @marcofrodl E-Mail: [email protected] LinkedIn: https://www.linkedin.com/in/marcofrodl/ https://www.thinktecture.com/thinktects/marco-frodl/
  3. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Turbo 🚀 https://www.aurelio.ai/semantic-router Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow, unreliable LLM generations to make tool-use or safety decisions, we use the magic of semantic vector space — routing our requests using semantic meaning.
  4. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Turbo 🚀 https://www.aurelio.ai/semantic-router Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow, unreliable LLM generations to make tool-use or safety decisions, we use the magic of semantic vector space — routing our requests using semantic meaning. It’s perfect for: input guarding, topic routing, tool-use decisions.
  5. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Turbo 🚀 in Numbers In my RAG example, Semantic Router using remote services is 3.4 times faster than an LLM and it is 30 times less expensive. Semantic Router with local model is 7.7 times faster and 60 times less expensive vs LLM.
  6. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Really? More Safety? Better Speed? Less Budget?
  7. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Refresher: What is RAG? “Retrieval-Augmented Generation (RAG) extends the capabilities of LLMs to an organization's internal knowledge, all without the need to retrain the model.
  8. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Refresher: What is RAG? https://aws.amazon.com/what-is/retrieval-augmented-generation/ “Retrieval-Augmented Generation (RAG) extends the capabilities of LLMs to an organization's internal knowledge, all without the need to retrain the model. It references an authoritative knowledge base outside of its training data sources before generating a response”
  9. Ask me anything MLCon 2025 Beyond LLMs: Using Embedding Models

    for Input Guarding, Semantic Routing, and Tool Decisions Simple RAG Question Prepare Search Search Results Question Answer LLM Vector DB Embedding Model Question as Vector Workflow Terms - Retriever - Chain Elements Embedding- Model Vector- DB Python LLM LangChain
  10. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Demo: Simple RAG
  11. Our sample content MLCon 2025 Beyond LLMs: Using Embedding Models

    for Input Guarding, Semantic Routing, and Tool Decisions Simple RAG in a nutshell
  12. Which retriever do you want? MLCon 2025 Beyond LLMs: Using

    Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Multiple Retriever
  13. Best source determination before the search MLCon 2025 Beyond LLMs:

    Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Advanced RAG Question Retriever Selection 0-N Search Results Question Answer LLM Embedding Model Vector DB A Question as Vector Vector DB B LLM Prepare Search or
  14. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Demo: Dynamic Retriever Selection with LLM
  15. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Embedding Model
  16. Best source determination before the search MLCon 2025 Beyond LLMs:

    Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Advanced RAG Question Retriever Selection 0-N Search Results Question Answer LLM Embedding Model Vector DB A Question as Vector Vector DB B LLM Prepare Search or
  17. Best source determination before the search MLCon 2025 Beyond LLMs:

    Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Advanced RAG w/ Semantic Router Question Retriever Selection 0-N Search Results Question Answer Embedding Model Vector DB A Question as Vector Vector DB B LLM Prepare Search or Embedding Model
  18. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Demo: Semantic Router with RAG
  19. LLM as Router MLCon 2025 Beyond LLMs: Using Embedding Models

    for Input Guarding, Semantic Routing, and Tool Decisions Turbo 🐌
  20. Semantic Router with remote embedding model MLCon 2025 Beyond LLMs:

    Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Turbo 🚀
  21. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Demo: Semantic Router running locally
  22. Semantic Router with local embedding model MLCon 2025 Beyond LLMs:

    Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Turbo 🚀
  23. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Speed & Budget in Numbers SR Remote is 3.4 times faster than LLM (0,62s vs 0,18s) SR Local is 7.75 times faster than LLM (0,62s vs 0,08s) SR Remote is 30 times cheaper than LLM ($0,60 vs $0,02) SR Local is 60 times cheaper than LLM ($0,60 vs $0,01)
  24. Tool-Call needed? Tool Calling Workflow with LLM MLCon 2025 Beyond

    LLMs: Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Tool-use Decisions Answer Execute Tool-Call Prepare Tool-Call More Tools? Regular LLM-Call Question LLM LLM LLM
  25. Tool-Call needed? Tool Calling Workflow with LLM + Semantic Router

    MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding, Semantic Routing, and Tool Decisions Tool-use Decisions Answer Execute Tool-Call Prepare Tool-Call More Tools? Regular LLM-Call Embedding Model LLM LLM Question
  26. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Demo: Semantic Router tool-use decisions
  27. MLCon 2025 Beyond LLMs: Using Embedding Models for Input Guarding,

    Semantic Routing, and Tool Decisions Yes, please! More Safety! Better Speed! Less Budget!