Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

The Theory behind Vector DB

Yusuke Matsui
December 26, 2023

The Theory behind Vector DB

Retrieval Augmented Generation (RAG) is a popular method for adding information to Large Language Models (LLMs), and Vector DB is the core technology used for the retrieval part of RAG. In this presentation, I will focus on the mathematical background of the approximate nearest neighbor search algorithms that are central components of Vector DB. Specifically, I will provide details on graph-based search, the most widely used family of algorithms.

The 1st Japan-Korea Workshop on Artificial Intelligence, Dec. 27, 2023 @UTokyo

Yusuke Matsui (The University of Tokyo) https://yusukematsui.me/

Yusuke Matsui

December 26, 2023
Tweet

More Decks by Yusuke Matsui

Other Decks in Research

Transcript

  1. Slides: bit.ly/41BTrrS 1 The Theory behind Vector DB Yusuke Matsui

    The University of Tokyo The 1st Japan-Korea Workshop on Artificial Intelligence, Dec. 27, 2023 @UTokyo
  2. Slides: bit.ly/41BTrrS 2 Yusuke Matsui ✓ Computer vision ✓ Large-scale

    indexing ✓ Data structure + Machine learning http://yusukematsui.me Lecturer (Assistant Professor), the University of Tokyo, Japan @utokyo_bunny ARM impl. of Faiss [Matsui+, ICASSP 22] Nearest neighbor search [Ono & Matsui, ACMMM 23] @matsui528 ML-enhanced Bloom Filter [Sato & Matsui, NeurIPS 23]
  3. Slides: bit.ly/41BTrrS 3 ➢ Introduction ✓ Demo: Embedding + Search

    + LLM ➢ The theory behind Vector DB ➢ Discussion
  4. Slides: bit.ly/41BTrrS 4 ➢ Introduction ✓ Demo: Embedding + Search

    + LLM ➢ The theory behind Vector DB ➢ Discussion
  5. Slides: bit.ly/41BTrrS 5 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT

    "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “I'm sorry, but as an AI language model, I don't have information about the future events.” Ask  Want to add knowledge to LLM
  6. Slides: bit.ly/41BTrrS 6 Embedding + Search + LLM(RAG; Retrieval Augmented

    Generation) Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021)
  7. Slides: bit.ly/41BTrrS 7 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT

    "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … “Chinami Yoshida¥n¥n==Personal…” Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  8. Slides: bit.ly/41BTrrS 8 𝒙1 , Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon

    credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … “Chinami Yoshida¥n¥n==Personal…” Text Encoder Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  9. Slides: bit.ly/41BTrrS 9 𝒙1 , 𝒙2 , Texts are from:

    https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … “Chinami Yoshida¥n¥n==Personal…” Text Encoder Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  10. Slides: bit.ly/41BTrrS 10 𝒙1 , 𝒙2 , … , 𝒙𝑁

    Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … “Chinami Yoshida¥n¥n==Personal…” Text Encoder Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  11. Slides: bit.ly/41BTrrS 11 0.23 3.15 0.65 1.43 𝒙1 , 𝒙2

    , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  12. Slides: bit.ly/41BTrrS 12 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" ChatGPT 3.5 (trained in 2021) Search “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “List of 2022 Winter Olympics medal winners…” Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  13. Slides: bit.ly/41BTrrS 13 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" Search Update “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “Who won curling gold at the 2022 Winter Olympics? Use the bellow articles: List of 2022 Winter Olympics medal winners…” “List of 2022 Winter Olympics medal winners…” ChatGPT 3.5 (trained in 2021) Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  14. Slides: bit.ly/41BTrrS 14 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" “Niklas Edin, Oskar Eriksson, …” Search Update ☺ “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “Who won curling gold at the 2022 Winter Olympics? Use the bellow articles: List of 2022 Winter Olympics medal winners…” “List of 2022 Winter Olympics medal winners…” ChatGPT 3.5 (trained in 2021) Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  15. Slides: bit.ly/41BTrrS 15 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" “Niklas Edin, Oskar Eriksson, …” Search Update ☺ “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “Who won curling gold at the 2022 Winter Olympics? Use the bellow articles: List of 2022 Winter Olympics medal winners…” “List of 2022 Winter Olympics medal winners…” ChatGPT 3.5 (trained in 2021) RAG is the simplest method to supplement knowledge to LLM Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  16. Slides: bit.ly/41BTrrS 17 DEMO1: OpenAI Playground (full managed service) Turn

    on “Retrieval” Upload Olympic information ☺ Referring uploaded information
  17. Slides: bit.ly/41BTrrS 18 DEMO1: OpenAI Playground (full managed service) Turn

    on “Retrieval” Upload Olympic information ☺ Referring uploaded information
  18. Slides: bit.ly/41BTrrS 19 DEMO2: Run on your PC (pseudo code)

    https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb # Embed knowledge texts = [ "Chinami Yoshida¥n¥n==Persona...", "Lviv bid for the 2022 Winter...", ... ] features = encode(texts) # Embed a question question = "Who won curling gold at the 2022 Winter Olympics?" query_feature = encode(question) # Search ids = search(query_feature, features) # Add results to the question question += "Use the bellow articles" for id in ids: question += texts[id] # Ask LLM result = LLM.ask(question)
  19. Slides: bit.ly/41BTrrS 20 DEMO2: Run on your PC (pseudo code)

    # Embed knowledge texts = [ "Chinami Yoshida¥n¥n==Persona...", "Lviv bid for the 2022 Winter...", ... ] features = encode(texts) # Embed a question question = "Who won curling gold at the 2022 Winter Olympics?" query_feature = encode(question) # Search ids = search(query_feature, features) # Add results to the question question += "Use the bellow articles" for id in ids: question += texts[id] # Ask LLM result = LLM.ask(question) ➢ エンコーダはなんでもいい ➢ 古典方式(BM25とか)もアリ ➢ モダン方式でもいい(がGPUいる) ➢ OpenAI APIでもいい(GPUいらないがお金かか る) ➢ Any encoders are fine ➢ Classic methods (e.g., BM25) is also fine ➢ Modern methods are also fine (but you need GPUs) ➢ OpenAI API is also fine (but you need to pay) How to prepare these? Search on either CPUs or GPUs (Accuracy vs Runtime vs Memory vs Money) ➢ You need GPUs to run it on your local machine. You need to prepare an executable model such as LLaMa ➢ Or, you can use OpenAI API. No need to prepare GPUs, but need to pay https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb
  20. Slides: bit.ly/41BTrrS 21 DEMO2: Run on your PC (pseudo code)

    # Embed knowledge texts = [ "Chinami Yoshida¥n¥n==Persona...", "Lviv bid for the 2022 Winter...", ... ] features = encode(texts) # Embed a question question = "Who won curling gold at the 2022 Winter Olympics?" query_feature = encode(question) # Search ids = search(query_feature, features) # Add results to the question question += "Use the bellow articles" for id in ids: question += texts[id] # Ask LLM result = LLM.ask(question) ➢ エンコーダはなんでもいい ➢ 古典方式(BM25とか)もアリ ➢ モダン方式でもいい(がGPUいる) ➢ OpenAI APIでもいい(GPUいらないがお金かか る) ➢ Any encoders are fine ➢ Classic methods (e.g., BM25) is also fine ➢ Modern methods are also fine (but you need GPUs) ➢ OpenAI API is also fine (but you need to pay) How to prepare these? Search on either CPUs or GPUs (Accuracy vs Runtime vs Memory vs Money) ➢ You need GPUs to run it on your local machine. You need to prepare an executable model such as LLaMa ➢ Or, you can use OpenAI API. No need to prepare GPUs, but need to pay ➢ “Embedding + Search” is traditional software engineering-ish ➢ API call or not ✓ Easy, but costly ✓ Reproducibility for research? https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb
  21. Slides: bit.ly/41BTrrS 22 ➢ Introduction ✓ Demo: Embedding + Search

    + LLM ➢ The theory behind Vector DB ➢ Discussion
  22. Slides: bit.ly/41BTrrS 23 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" “Niklas Edin, Oskar Eriksson, …” Search Update ☺ “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “Who won curling gold at the 2022 Winter Olympics? Use the bellow articles: List of 2022 Winter Olympics medal winners…” “List of 2022 Winter Olympics medal winners…” ChatGPT 3.5 (trained in 2021) RAG is the simplest method to supplement knowledge to LLM Embedding + Search + LLM(RAG; Retrieval Augmented Generation)
  23. Slides: bit.ly/41BTrrS 24 0.23 3.15 0.65 1.43 0.20 3.25 0.72

    1.68 argmin 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 Texts are from: https://github.com/openai/openaicookbook/blob/main/examples/Question_answering_using_embeddings.ipynb Icon credit: https://ja.wikipedia.org/wiki/ChatGPT "Who won curling gold at the 2022 Winter Olympics?" “Niklas Edin, Oskar Eriksson, …” Search Update ☺ “Damir Sharipzyanov¥n¥n=Career…” “Lviv bid for the 2022 Winter…” … Text Encoder “Chinami Yoshida¥n¥n==Personal…” Text Encoder “Who won curling gold at the 2022 Winter Olympics? Use the bellow articles: List of 2022 Winter Olympics medal winners…” “List of 2022 Winter Olympics medal winners…” ChatGPT 3.5 (trained in 2021) RAG is the simplest method to supplement knowledge to LLM Embedding + Search + LLM(RAG; Retrieval Augmented Generation) ➢ Approx. nearest neighbor search! ➢ Accuracy vs. runtime vs. memory ➢ Vector DB??
  24. Slides: bit.ly/41BTrrS 25 Three levels of technology Milvus Pinecone Qdrant

    ScaNN (4-bit PQ) [Guo+, ICML 2020] Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] Weaviate Vertex AI Matching Engine faiss NMSLIB hnswlib Vald ScaNN jina
  25. Slides: bit.ly/41BTrrS Three levels of technology 26 Milvus Pinecone Qdrant

    ScaNN (4-bit PQ) [Guo+, ICML 2020] Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Weaviate Vertex AI Matching Engine NMSLIB hnswlib Vald ScaNN jina Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] faiss One library may implement multiple algorithms  “I benchmarked faiss” ☺ “I benchmarked PQ in faiss”
  26. Slides: bit.ly/41BTrrS Three levels of technology 27 Milvus Pinecone Qdrant

    ScaNN (4-bit PQ) [Guo+, ICML 2020] Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Weaviate Vertex AI Matching Engine Vald ScaNN jina Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] faiss NMSLIB hnswlib One algorithm may be implemented in multiple libraries
  27. Slides: bit.ly/41BTrrS Three levels of technology 28 Milvus Pinecone Qdrant

    Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] Weaviate Vertex AI Matching Engine faiss NMSLIB hnswlib Vald jina ScaNN (4-bit PQ) [Guo+, ICML 2020] ScaNN Often, one library = one algorithm
  28. Slides: bit.ly/41BTrrS Three levels of technology 29 Pinecone Qdrant ScaNN

    (4-bit PQ) [Guo+, ICML 2020] Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Vertex AI Matching Engine NMSLIB Vald ScaNN jina Weaviate Milvus faiss hnswlib Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] One service may use some libraries … or re-implement algorithms from scratch (e.g., by Go)
  29. Slides: bit.ly/41BTrrS 30 Three levels of technology Milvus Pinecone Qdrant

    ScaNN (4-bit PQ) [Guo+, ICML 2020] Algorithm ➢ Scientific paper ➢ Math ➢ Often, by researchers Library ➢ Implementations of algorithms ➢ Usually, a search function only ➢ By researchers, developers, etc Service (e.g., vector DB) ➢ Library + (handling metadata, serving, scaling, IO, CRUD, etc) ➢ Usually, by companies Product Quantization + Inverted Index (PQ, IVFPQ) [Jégou+, TPAMI 2011] Hierarchical Navigable Small World (HNSW) [Malkov+, TPAMI 2019] Weaviate Vertex AI Matching Engine faiss NMSLIB hnswlib Vald ScaNN jina This talk mainly focuses algorithms
  30. Slides: bit.ly/41BTrrS 31 𝑁 109 106 billion-scale million-scale Locality Sensitive

    Hashing (LSH) Tree / Space Partitioning Graph traversal 0.34 0.22 0.68 0.71 0 1 0 0 ID: 2 ID: 123 0.34 0.22 0.68 0.71 Space partition Data compression ➢ k-means ➢ PQ/OPQ ➢ Graph traversal ➢ etc… ➢ Raw data ➢ Scalar quantization ➢ PQ/OPQ ➢ etc… Look-up-based Hamming-based Linear-scan by Asymmetric Distance … Linear-scan by Hamming distance Inverted index + data compression For raw data: Acc. ☺, Memory:  For compressed data: Acc. , Memory: ☺
  31. Slides: bit.ly/41BTrrS 32 𝑁 109 106 billion-scale million-scale Locality Sensitive

    Hashing (LSH) Tree / Space Partitioning Graph traversal 0.34 0.22 0.68 0.71 0 1 0 0 ID: 2 ID: 123 0.34 0.22 0.68 0.71 Space partition Data compression ➢ k-means ➢ PQ/OPQ ➢ Graph traversal ➢ etc… ➢ Raw data ➢ Scalar quantization ➢ PQ/OPQ ➢ etc… Look-up-based Hamming-based Linear-scan by Asymmetric Distance … Linear-scan by Hamming distance Inverted index + data compression For raw data: Acc. ☺, Memory:  For compressed data: Acc. , Memory: ☺ De facto standard! Will explain
  32. Slides: bit.ly/41BTrrS 33 Graph search ➢ De facto standard if

    all data can be loaded on memory ➢ Fast and accurate for real-world data Images are from [Malkov+, Information Systems, 2013] ➢ Traverse graph towards the query ➢ Seems intuitive, but not so much easy to understand ➢ Review the algorithm carefully
  33. Slides: bit.ly/41BTrrS 34 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M ➢ Given a query vector Candidates (size = 3) Close to the query Name each node for explanation
  34. Slides: bit.ly/41BTrrS 35 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M ➢ Given a query vector ➢ Start from an entry point (e.g., ) Candidates (size = 3) Close to the query M
  35. Slides: bit.ly/41BTrrS 36 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M ➢ Given a query vector ➢ Start from an entry point (e.g., ). Record the distance to q. Candidates (size = 3) Close to the query M M 23.1
  36. Slides: bit.ly/41BTrrS 37 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query 23.1 M
  37. Slides: bit.ly/41BTrrS 38 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query M 23.1 1st iteration
  38. Slides: bit.ly/41BTrrS 39 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M ➢ Pick up the unchecked best candidate ( ) Candidates (size = 3) Close to the query M M 23.1 Best Best
  39. Slides: bit.ly/41BTrrS 40 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M ➢ Pick up the unchecked best candidate ( ). Check it. Candidates (size = 3) Close to the query M M 23.1 Best Best check!
  40. Slides: bit.ly/41BTrrS 41 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query 23.1 Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. Best M M check!
  41. Slides: bit.ly/41BTrrS 42 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query 23.1 Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. N M M check!
  42. Slides: bit.ly/41BTrrS 43 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. N M J 11.1 N 15.3 K 19.4 M 23.1 check!
  43. Slides: bit.ly/41BTrrS 44 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. N M J 11.1 N 15.3 K 19.4 M 23.1
  44. Slides: bit.ly/41BTrrS 45 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) N M J 11.1 N 15.3 K 19.4 M 23.1
  45. Slides: bit.ly/41BTrrS 46 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) N M J 11.1 N 15.3 K 19.4
  46. Slides: bit.ly/41BTrrS 47 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4
  47. Slides: bit.ly/41BTrrS 48 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4 2nd iteration
  48. Slides: bit.ly/41BTrrS 49 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4 ➢ Pick up the unchecked best candidate ( ) J Best Best
  49. Slides: bit.ly/41BTrrS 50 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4 ➢ Pick up the unchecked best candidate ( ). Check it. J Best Best check!
  50. Slides: bit.ly/41BTrrS 51 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. J Best Best check!
  51. Slides: bit.ly/41BTrrS 52 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N J 11.1 N 15.3 K 19.4 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. J Best 13.2 9.7 check! Already visited
  52. Slides: bit.ly/41BTrrS 53 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. J Best 13.2 9.7 J 11.1 N 15.3 B 2.3 G 3.5 I 9.7 F 10.2 L 13.2 check! Already visited
  53. Slides: bit.ly/41BTrrS 54 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. J Best J 11.1 N 15.3 B 2.3 G 3.5 I 9.7 F 10.2 L 13.2
  54. Slides: bit.ly/41BTrrS 55 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) J Best J 11.1 N 15.3 B 2.3 G 3.5 I 9.7 F 10.2 L 13.2
  55. Slides: bit.ly/41BTrrS 56 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) J Best B 2.3 G 3.5 I 9.7
  56. Slides: bit.ly/41BTrrS 57 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7
  57. Slides: bit.ly/41BTrrS 58 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7 3rd iteration
  58. Slides: bit.ly/41BTrrS 59 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7 Best Best ➢ Pick up the unchecked best candidate ( ) B
  59. Slides: bit.ly/41BTrrS 60 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7 Best Best ➢ Pick up the unchecked best candidate ( ). Check it. B check!
  60. Slides: bit.ly/41BTrrS 61 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7 Best Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. B check!
  61. Slides: bit.ly/41BTrrS 62 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N B 2.3 G 3.5 I 9.7 Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. B 0.5 2.1 check!
  62. Slides: bit.ly/41BTrrS 63 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N Best ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. B 0.5 2.1 C 0.5 D 2.1 A 3.6 B 2.3 G 3.5 I 9.7 check!
  63. Slides: bit.ly/41BTrrS 64 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. B C 0.5 D 2.1 A 3.6 B 2.3 G 3.5 I 9.7 Best
  64. Slides: bit.ly/41BTrrS 65 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) B C 0.5 D 2.1 A 3.6 B 2.3 G 3.5 I 9.7 Best
  65. Slides: bit.ly/41BTrrS 66 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) B C 0.5 D 2.1 B 2.3 Best
  66. Slides: bit.ly/41BTrrS 67 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3
  67. Slides: bit.ly/41BTrrS 68 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 4th iteration
  68. Slides: bit.ly/41BTrrS 69 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). C Best Best
  69. Slides: bit.ly/41BTrrS 70 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. C Best Best check!
  70. Slides: bit.ly/41BTrrS 71 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. C Best Best check!
  71. Slides: bit.ly/41BTrrS 72 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. C Best check! Already visited Already visited Already visited Already visited
  72. Slides: bit.ly/41BTrrS 73 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) C Best
  73. Slides: bit.ly/41BTrrS 74 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3
  74. Slides: bit.ly/41BTrrS 75 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 5th iteration
  75. Slides: bit.ly/41BTrrS 76 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). D Best Best
  76. Slides: bit.ly/41BTrrS 77 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. D Best Best check!
  77. Slides: bit.ly/41BTrrS 78 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. D Best Best check!
  78. Slides: bit.ly/41BTrrS 79 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. D Best check! Already visited Already visited
  79. Slides: bit.ly/41BTrrS 80 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. D Best check! Already visited Already visited H 3.9
  80. Slides: bit.ly/41BTrrS 81 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. D Best H 3.9
  81. Slides: bit.ly/41BTrrS 82 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) D Best H 3.9
  82. Slides: bit.ly/41BTrrS 83 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ Pick up the unchecked best candidate ( ). Check it. ➢ Find the connected points. ➢ Record the distances to q. ➢ Maintain the candidates (size=3) D Best
  83. Slides: bit.ly/41BTrrS 84 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ All candidates are checked. Finish. ➢ Here, is the closet to the query ( ) C
  84. Slides: bit.ly/41BTrrS 85 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 C Final output 1: Candidates ➢ You can pick up topk results ➢ All candidates are checked. Finish. ➢ Here, is the closet to the query ( )
  85. Slides: bit.ly/41BTrrS 86 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 C Final output 1: Candidates ➢ You can pick up topk results ➢ All candidates are checked. Finish. ➢ Here, is the closet to the query ( ) Final output 2: Checked items ➢ i.e., search path
  86. Slides: bit.ly/41BTrrS 87 Search Images are from [Malkov+, Information Systems,

    2013] A B C D E F G H I J K L N M Candidates (size = 3) Close to the query N C 0.5 D 2.1 B 2.3 ➢ All candidates are checked. Finish. ➢ Here, is the closet to the query ( ) C Final output 1: Candidates ➢ You can pick up topk results Final output 2: Checked items ➢ i.e., search path Final output 3: Visit flag ➢ For each item, visited or not
  87. Slides: bit.ly/41BTrrS 88 Observation: runtime ➢ Item comparison takes time;

    𝑂 𝐷 ➢ The overall runtime ~ #item_comparison ∼ length_of_search_path * average_outdegree 𝒒 ∈ ℝ𝐷 𝒙13 ∈ ℝ𝐷 start query start query start query 1st path 2nd path 3rd path 2.1 1.9 outdegree = 1 outdegree = 2 outdegree = 2 #item_comparison = 3 * (1 + 2 + 2)/3 = 5 2.4
  88. Slides: bit.ly/41BTrrS 89 A D C B query Observation: candidate

    size E start Candidates (size = 1) C A D C B query E start Candidates (size = 3) C D E size = 1: Greedy search size > 1: Beam search ➢ Larger candidate size, better but slower results ➢ Online parameter to control the trade-off ➢ Called “ef” in HNSW Fast. But stuck in a local minimum Slow. But find a better solution
  89. Slides: bit.ly/41BTrrS 90 Graph search algorithms Images are from an

    excellent survey paper [Wang+, VLDB 2021] ➢ Lots of algorithms ➢ Ours also! Ono & Matsui, “Relative NN-Descent”, ACMMM 23 ➢ The basic structure is same: (1) designing a good graph + (2) beam search
  90. Slides: bit.ly/41BTrrS 91 Just NN? Vector DB? ➢ Vector DB

    companies say “Vector DB is cool” ➢ My own idea: ➢ Which vector DB? ➡ No conclusions! ✓ https://weaviate.io/blog/vector-library-vs-vector-database ✓ https://codelabs.milvus.io/vector-database-101-what-is-a-vector-database/index#2 ✓ https://zilliz.com/learn/what-is-vector-database Try the simplest numpy–only search Slow? Try fast algorithm such as HNSW in faiss Try Vector DB If speed is the only concern, just use libraries
  91. Slides: bit.ly/41BTrrS 92 ➢ Introduction ✓ Demo: Embedding + Search

    + LLM ➢ The theory behind Vector DB ➢ Discussion
  92. Slides: bit.ly/41BTrrS 93 Recent trends ➢ “Embedding + Search” is

    too naïve? ✓ “A similar sentence to the question” might not be useful at all. ➢ Smarter search? ✓ Classic technologies have been revisited! ✓ E.g., Query expansion  Recall [Chum+, ICCV 2007]. They applied query expansion, which was a classic IR technique, to image retrieval.  Revisit repeats! ➢ Directions ✓ Solve in a machine-learning way? E.g., training a special embedding space. ✓ Solve by data-structures? E.g., data structure with a special operations.
  93. Slides: bit.ly/41BTrrS 94 Recent trends ➢ “Embedding + Search” is

    too naïve? ✓ “A similar sentence to the question” might not be useful at all. ➢ Smarter search? ✓ Classic technologies have been revisited! ✓ E.g., Query expansion  Recall [Chum+, ICCV 2007]. They applied query expansion, which was a classic technique, to image retrieval.  Revisit repeats! ➢ Directions ✓ Solve in a machine-learning way? E.g., training a special embedding space. ✓ Solve by data-structures? E.g., data structure with a special operations.
  94. Slides: bit.ly/41BTrrS 95 Recent trends ➢ “Embedding + Search” is

    too naïve? ✓ “A similar sentence to the question” might not be useful at all. ➢ Smarter search? ✓ Classic technologies have been revisited! ✓ E.g., Query expansion  Recall [Chum+, ICCV 2007]. They applied query expansion, which was a classic IR technique, to image retrieval.  Revisit repeats! ➢ Directions ✓ Solve in a machine-learning way? E.g., training a special embedding space. ✓ Solve by data-structures? E.g., data structure with a special operations.
  95. Slides: bit.ly/41BTrrS 96 From the perspective of Nearest Neighbor Search

    ➢ CPU? GPU? Managed services? ✓ Trade-off! ✓ NVIDIA pushing GPU-search (RAFT&CAGRA [Ootomo+, arXiv 23]) ✓ Can we spend money to search? (Not training a model) ➢ Next paradigm ✓ Graph-based search is mature. ✓ We need breakthrough idea? ➢ Not trivial:Dimensionality (𝐷) is now 1000 (not 100) ✓ If 𝐷 > 100, it’s often said that we must PCA it to 𝐷 = 100. ✓ Because of LLM API, we need to handle 𝐷 = 1000 anyway.
  96. Slides: bit.ly/41BTrrS 97 Use LLM as a building block ➢

    The era of using LLMs as functions in pseudo code has already arrived. ➢ Should we run something locally or API call? A choice that has never been made before. ➢ How to think about "Reproducibility" in the LLM and API Era ➢ Need to be able to roughly estimate what is possible solutions to a problem, and at what cost, and what will be obtained. On-premises own server Cloud (e.g., AWS EC2) API Call ☺Free to use Initial cost maintenance ☺Good balance Depends on cloud ☺Super cheap Not good at large-processing Communication cost? New paradigm
  97. Slides: bit.ly/41BTrrS 98 Reference ➢ Survey for NN search: CVPR

    20 Tutorial ➢ SOTA graph-based search: CVPR 23 Tutorial ➢ Good tutorial for RAG [Asai+, ACL 2023 Tutorial]
  98. Slides: bit.ly/41BTrrS 99 ◼ [Jégou+, TPAMI 2011] H. Jégou+, “Product

    Quantization for Nearest Neighbor Search”, IEEE TPAMI 2011 ◼ [Guo+, ICML 2020] R. Guo+, “Accelerating Large-Scale Inference with Anisotropic Vector Quantization”, ICML 2020 ◼ [Malkov+, TPAMI 2019] Y. Malkov+, “Efficient and Robust Approximate Nearest Neighbor search using Hierarchical Navigable Small World Graphs,” IEEE TPAMI 2019 ◼ [Malkov+, IS 13] Y, Malkov+, “Approximate Nearest Neighbor Algorithm based on Navigable Small World Graphs”, Information Systems 2013 ◼ [Fu+, VLDB 19] C. Fu+, “Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graphs”, 2019 ◼ [Wang+, VLDB 21] M. Wang+, “A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search”, VLDB 2021 ◼ [Iwasaki+, arXiv 18] M. Iwasaki and D. Miyazaki, “Optimization if Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data”, arXiv 2018 ◼ [Ootomo+, arXiv 23] H. Ootomo+, “CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs”, arXiv 2023 ◼ [Chum+, ICCV 07] O. Chum+, “Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval”, ICCV 2007 ◼ [Pinecone] https://www.pinecone.io/ ◼ [Milvus] https://milvus.io/ ◼ [Qdrant] https://qdrant.tech/ ◼ [Weaviate] https://weaviate.io/ ◼ [Vertex AI Matching Engine] https://cloud.google.com/vertex-ai/docs/matching-engine ◼ [Vald] https://vald.vdaas.org/ ◼ [Modal] https://modal.com/ Reference