Reduce LLM Calls with Vector Search

⡋ 2025 Redis Ltd. All rights reserved. Raphael De Lio
Reducing LLM calls with vector search design patterns

⡋ 2025 Redis Ltd. All rights reserved. 2 The new
stack for AI agents Redis leads as the most-used tool for agent data and vector search.

⡋ 2025 Redis Ltd. All rights reserved. 3 Not all
context is good context Bigger input sizes hurt performance—and your budget. GPT-5 API Price: $1.25 / 1M Input tokens - $10 / 1M Output tokens

⡋ 2025 Redis Ltd. All rights reserved. 4 Semantic classification
Semantic Tool Calling Semantic caching What we’re covering Vector search patterns for faster, cheaper, and greener performance.

⡋ 2025 Redis Ltd. All rights reserved. 5 Vector &
vector search Quick Recap

⡋ 2025 Redis Ltd. All rights reserved. 6 What is
a vector? A (-110, 500) B (465, -497) C (-167, -500) D (-178, -200) E (-195, -454)

⡋ 2025 Redis Ltd. All rights reserved. 7 What is
a vector? 0 500 -500 500 -500 Mass Temp E (-195, -454) D (-178, -200) C (-167, -500) A (-110, 500) B (465, -497) Each vector is a point in multi-dimensional space

⡋ 2025 Redis Ltd. All rights reserved. 8 Vector Search
0 500 -500 500 -500 Mass Temp Finding similarity means measuring the distance between vectors E (-195, -454) D (-178, -200) C (-167, -500) A (-110, 500) B (465, -497)

⡋ 2025 Redis Ltd. All rights reserved. 9 Embedding model
Embedding models turn unstructured data into vectors Images Text [0.0234, -0.1456, 0.0891, -0.2143, 0.1678, 0.0456, -0.0567, 0.2890, 0.0345, -0.1789, 0.0912, 0.1567, 0.1345, -0.0789, 0.0456, 0.1823, -0.0567, 0.0234, 0.0678, 0.1234, -0.0345, 0.0789, 0.1567, -0.0234, -0.1678, 0.0345, 0.1234, -0.0567, 0.0789, 0.1456, # ... continues for 384 total dimensions 0.0456, -0.0823, 0.1234, 0.0567, -0.1789, 0.0345]

⡋ 2025 Redis Ltd. All rights reserved. 10 Vector representations
enable similarity search What's the capital of Germany? is similar to Which city is the capital of Germany?

⡋ 2025 Redis Ltd. All rights reserved. 11 The essentials
of vector search Learn the fundamentals on our YouTube channel https:// www.youtube.com/ watch?v=9NvO-VdjY80 https:// www.youtube.com/ watch?v=Yhv19le0sBw https:// www.youtube.com/ watch? v=0U1S0WSsPuE

⡋ 2025 Redis Ltd. All rights reserved. 13 Classifying text
Challenge #1

⡋ 2025 Redis Ltd. All rights reserved. 14 Approach #1:
Using an LLM Is this about Redis? LLM True/false Response Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde... Social Media Post Prompt Every query runs through the model—simple, but expensive

⡋ 2025 Redis Ltd. All rights reserved. 15 Token consumption
Disadvantages Time spent High token consumption and wasted time add up quickly.

Using a vector database LLM Pro tip: Use SCAN instead of KEYS in production. KEYS blocks the entire server while SCAN is non-blocking. Remember when everyone said Redis is just a cache? Now it powers real- time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime Generate 150 tweets about Redis [...]

Using a vector database Redis is the fastest tool for performing semantic caching Remember when everyone said Redis is just a cache? Now it powers real- time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime [...] Embedding model Embed references Vector database Store embeddings

Using a vector database Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys open.substack.com/pub/systemde... New real post Embedding model Embed Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Is it similar enough?

⡋ 2025 Redis Ltd. All rights reserved. 19 Demo Challenge
#1

⡋ 2025 Redis Ltd. All rights reserved. 20 Classification self
improvement Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys open.substack.com/pub/systemde... New real post Embedding model Embed Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Add as new route reference

⡋ 2025 Redis Ltd. All rights reserved. 21 Classification hybrid
approach Redis 8 can scale to 1 billion vectors while keeping a median latency of 200ms New real post Embedding model Embed Vector database Similarity search Our Redis instance has been running for 847 days without restart. Rock solid stability 💪 #redis #uptime Similarity score: 0.693 Add as new route reference If not similar enough Fallback to classification with LLM LLM

⡋ 2025 Redis Ltd. All rights reserved. 22 Calling the
right tools Challenge #2

Using an LLM Agent These are the available tools LLM Call tool X

⡋ 2025 Redis Ltd. All rights reserved. 24 Disadvantages Token
consumption Time spent

Using a vector database What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Hello! How can you help me? Tool: greeting_and_help Do I have any notifications? Tool: new_notifications Read my notifications Tool: new_notifications Turn on the lights Tool: turn_on_the_lights_room Make the lights light Tool: turn_on_the_lights_room Tool: get_weather_default_city

Using a vector database Embedding model Embed references Vector database Store embeddings What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help [...] Tool: [...] Tool: get_weather_default_city

Using a vector database Hey!! What are you capable of?? User prompt Embedding model Embed Vector database Similarity search Hello! What can you do? Similarity Score: 0.0459 Is it similar enough? Tool: greeting_and_help

#2

⡋ 2025 Redis Ltd. All rights reserved. 29 Tool calling
chunking Hey. I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree. Anyway, will it also rain today? User prompt Embedding model Embed Vector database Similarity search Hello! What can you do? Similarity Score: 0.928 Tool: greeting_and_help

⡋ 2025 Redis Ltd. All rights reserved. 30 Tool calling
chunking Hey. I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree. Anyway, will it also rain today? User prompt Embedding model Embed Vector database Similarity search Will it rain today? Similarity score: 0.274 Tool: FIX WATHER TOOL Hey. Anyway, will it also rain today? I had a bad day yesterday. The weather was terrible and I crashed my bike onto a tree.

Regular flow (no caching) Chatbot Input Read LLM Generate Output

Using a vector database Chatbot Input Read Embedding model Vector database Cache Embed LLM Generate

#3

⡋ 2025 Redis Ltd. All rights reserved. 37 Retrieval optimizer
The retrieval optimizer is an open-source framework for systematically improving the performance of search and retrieval systems that run on Redis. It is designed to take you from “my search seems okay” to “I can prove this configuration is optimal” by combining benchmarking, experimentation, and automated optimization.

Reduce LLM Calls with Vector Search

Reduce LLM Calls with Vector Search

More Decks by Raphael De Lio

Other Decks in Technology

Featured

Transcript