Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reduce LLM Calls with Vector Search

Reduce LLM Calls with Vector Search

LLMs are powerful, but calling them for everything gets expensive, slow, and energy-hungry fast. What if you could handle common tasks like classification, routing, and caching without reaching for a massive model every time?

In this session, I’ll show you how to use vector search and semantic patterns to build smarter systems that skip unnecessary LLM calls and still deliver. We’ll cover:

• How semantic classification can match intent without tokens or prompts
• How to route requests based on meaning, not brittle rules
• How semantic caching helps you reuse answers and cut costs

You’ll see how to replace brute-force prompting with clean, efficient logic using embeddings, similarity, and lightweight decision-making. No complex ML pipelines, no GPU bills, just smart patterns that save time, money, and energy.

This session will help you do it better with fewer calls, less waste, and a lot more control.

Avatar for Raphael De Lio

Raphael De Lio

September 30, 2025
Tweet

More Decks by Raphael De Lio

Other Decks in Technology

Transcript

  1. ⡋ 2025 Redis Ltd. All rights reserved. Raphael De Lio

    Reducing LLM calls with vector search design patterns
  2. ⡋ 2025 Redis Ltd. All rights reserved. 2 The new

    stack for AI agents Redis leads as the most-used tool for agent data and vector search.
  3. ⡋ 2025 Redis Ltd. All rights reserved. 3 Not all

    context is good context Bigger input sizes hurt performance—and your budget. GPT-5 API Price: $1.25 / 1M Input tokens - $10 / 1M Output tokens
  4. ⡋ 2025 Redis Ltd. All rights reserved. 4 Semantic classification

    Semantic Tool Calling Semantic caching What we’re covering Vector search patterns for faster, cheaper, and greener performance.
  5. ⡋ 2025 Redis Ltd. All rights reserved. 6 What is

    a vector? A (-110, 500) B (465, -497) C (-167, -500) D (-178, -200) E (-195, -454)
  6. ⡋ 2025 Redis Ltd. All rights reserved. 7 What is

    a vector? 0 500 -500 500 -500 Mass Temp E (-195, -454) D (-178, -200) C (-167, -500) A (-110, 500) B (465, -497) Each vector is a point in multi-dimensional space
  7. ⡋ 2025 Redis Ltd. All rights reserved. 8 Vector Search

    0 500 -500 500 -500 Mass Temp Finding similarity means measuring the distance between vectors E (-195, -454) D (-178, -200) C (-167, -500) A (-110, 500) B (465, -497)
  8. ⡋ 2025 Redis Ltd. All rights reserved. 9 Embedding model

    Embedding models turn unstructured data into vectors Images Text [0.0234, -0.1456, 0.0891, -0.2143, 0.1678, 0.0456, -0.0567, 0.2890, 0.0345, -0.1789, 0.0912, 0.1567, 0.1345, -0.0789, 0.0456, 0.1823, -0.0567, 0.0234, 0.0678, 0.1234, -0.0345, 0.0789, 0.1567, -0.0234, -0.1678, 0.0345, 0.1234, -0.0567, 0.0789, 0.1456, # ... continues for 384 total dimensions 0.0456, -0.0823, 0.1234, 0.0567, -0.1789, 0.0345]
  9. ⡋ 2025 Redis Ltd. All rights reserved. 10 Vector representations

    enable similarity search What's the capital of Germany? is similar to Which city is the capital of Germany?
  10. ⡋ 2025 Redis Ltd. All rights reserved. 11 The essentials

    of vector search Learn the fundamentals on our YouTube channel https:// www.youtube.com/ watch?v=9NvO-VdjY80 https:// www.youtube.com/ watch?v=Yhv19le0sBw https:// www.youtube.com/ watch? v=0U1S0WSsPuE
  11. ⡋ 2025 Redis Ltd. All rights reserved. 14 Approach #1:

    Using an LLM Is this about Redis? LLM True/false Response Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde... Social Media Post Prompt Every query runs through the model—simple, but expensive
  12. ⡋ 2025 Redis Ltd. All rights reserved. 15 Token consumption

    Disadvantages Time spent High token consumption and wasted time add up quickly.
  13. ⡋ 2025 Redis Ltd. All rights reserved. 16 Approach #2:

    Using a vector database LLM Pro tip: Use SCAN instead of KEYS in production. KEYS blocks the entire server while SCAN is non-blocking. Remember when everyone said Redis is just a cache? Now it powers real- time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime Generate 150 tweets about Redis [...]
  14. ⡋ 2025 Redis Ltd. All rights reserved. 17 Approach #2:

    Using a vector database Redis is the fastest tool for performing semantic caching Remember when everyone said Redis is just a cache? Now it powers real- time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime [...] Embedding model Embed references Vector database Store embeddings
  15. ⡋ 2025 Redis Ltd. All rights reserved. 18 Approach #2:

    Using a vector database Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys open.substack.com/pub/systemde... New real post Embedding model Embed Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Is it similar enough?
  16. ⡋ 2025 Redis Ltd. All rights reserved. 20 Classification self

    improvement Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys open.substack.com/pub/systemde... New real post Embedding model Embed Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Add as new route reference
  17. ⡋ 2025 Redis Ltd. All rights reserved. 21 Classification hybrid

    approach Redis 8 can scale to 1 billion vectors while keeping a median latency of 200ms New real post Embedding model Embed Vector database Similarity search Our Redis instance has been running for 847 days without restart. Rock solid stability 💪 #redis #uptime Similarity score: 0.693 Add as new route reference If not similar enough Fallback to classification with LLM LLM
  18. ⡋ 2025 Redis Ltd. All rights reserved. 23 Approach #1:

    Using an LLM Agent These are the available tools LLM Call tool X
  19. ⡋ 2025 Redis Ltd. All rights reserved. 25 Approach #2:

    Using a vector database What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Hello! How can you help me? Tool: greeting_and_help Do I have any notifications? Tool: new_notifications Read my notifications Tool: new_notifications Turn on the lights Tool: turn_on_the_lights_room Make the lights light Tool: turn_on_the_lights_room Tool: get_weather_default_city
  20. ⡋ 2025 Redis Ltd. All rights reserved. 26 Approach #2:

    Using a vector database Embedding model Embed references Vector database Store embeddings What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help [...] Tool: [...] Tool: get_weather_default_city
  21. ⡋ 2025 Redis Ltd. All rights reserved. 27 Approach #2:

    Using a vector database Hey!! What are you capable of?? User prompt Embedding model Embed Vector database Similarity search Hello! What can you do? Similarity Score: 0.0459 Is it similar enough? Tool: greeting_and_help
  22. ⡋ 2025 Redis Ltd. All rights reserved. 29 Tool calling

    chunking Hey. I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree. Anyway, will it also rain today? User prompt Embedding model Embed Vector database Similarity search Hello! What can you do? Similarity Score: 0.928 Tool: greeting_and_help
  23. ⡋ 2025 Redis Ltd. All rights reserved. 30 Tool calling

    chunking Hey. I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree. Anyway, will it also rain today? User prompt Embedding model Embed Vector database Similarity search Will it rain today? Similarity score: 0.274 Tool: FIX WATHER TOOL Hey. Anyway, will it also rain today? I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree.
  24. ⡋ 2025 Redis Ltd. All rights reserved. 32 Approach #1:

    Regular flow (no caching) Chatbot Input Read LLM Generate Output
  25. ⡋ 2025 Redis Ltd. All rights reserved. 33 Token consumption

    Is there room for improvement? Time spent
  26. ⡋ 2025 Redis Ltd. All rights reserved. 34 Approach #2:

    Using a vector database Chatbot Input Read Embedding model Vector database Cache Embed LLM Generate
  27. ⡋ 2025 Redis Ltd. All rights reserved. 37 Retrieval optimizer

    The retrieval optimizer is an open-source framework for systematically improving the performance of search and retrieval systems that run on Redis. It is designed to take you from “my search seems okay” to “I can prove this configuration is optimal” by combining benchmarking, experimentation, and automated optimization.