When Learned Data Structures Meet Computer Vision

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 1 When Learned Data Structures Meet Computer
Vision Yusuke Matsui The University of Tokyo 9th International Workshop on Symbolic-Neural Learning (SNL2025) Oct 29, 2025 @Osaka

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 2 Yusuke Matsui ✓ Computer vision ✓
Data structure + Machine Learning http://yusukematsui.me Lecturer (Assistant Professor), the University of Tokyo, Japan @utokyo_bunny @matsui528 Zero-shot comic understanding [Li+, ACMMM 24] Diverse nearest neighbor search [Matsui, CVPR 25] ML-enhanced Sorting [Sato & Matsui, TMLR 25]

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 3 3D reconstruction [1] [1] Wang+, “VGGT:
Visual Geometry Grounded Transformer”, CVPR 2025 [2] Fawzi+, “Discovering faster matrix multiplication algorithms with reinforcement learning”, Nature, 2022 [3] https://gemini.google/jp/overview/image-generation/ https://finance.yahoo.com/news/gm-unveils-eyes-off-self-driving-conversational-google-ai-for-its-cars-at-tech-event-151049822.html https://en.wikipedia.org/wiki/ChatGPT#/media/File:ChatGPT-Logo.svg ChatGPT Autonomous driving Scientific discovery [2] Text-to-Image [3] AI coding https://www.claude.com/product/claude-code ML is popular

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 4 3D reconstruction [1] [1] Wang+, “VGGT:
Visual Geometry Grounded Transformer”, CVPR 2025 [2] Fawzi+, “Discovering faster matrix multiplication algorithms with reinforcement learning”, Nature, 2022 [3] https://gemini.google/jp/overview/image-generation/ https://finance.yahoo.com/news/gm-unveils-eyes-off-self-driving-conversational-google-ai-for-its-cars-at-tech-event-151049822.html https://en.wikipedia.org/wiki/ChatGPT#/media/File:ChatGPT-Logo.svg ChatGPT Autonomous driving Scientific discovery [2] Text-to-Image [3] AI coding https://www.claude.com/product/claude-code ML is popular We tend to focus on “big problems”. Can we apply ML to “small components”? Learned Data Structures (= ML-enhanced Data Structures)

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 5 The origin of learned data structures
SIGMOD 2019 Tutorial: http://people.csail.mit.edu/kraska/pub/sigmod19tutorialpart2.pdf Original proposer: Tim Kraska (MIT) Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis, “The Case for Learned Index Structures”, SIGMOD 2018

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 6 The origin of learned data structures
SIGMOD 2019 Tutorial: http://people.csail.mit.edu/kraska/pub/sigmod19tutorialpart2.pdf Original proposer: Tim Kraska (MIT) Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis, “The Case for Learned Index Structures”, SIGMOD 2018

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 7 In this talk, I’ll cover… What
kinds of data structures exist? What kinds of techniques are used? What can they do, and to what extent? Can they be applied to CV or LLMs?

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 8 Outline 1. One-dimensional index 2. Multi-dimensional
index 3. Bloom filter 4. Approximate nearest neighbor search 5. Discussion

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 10 One-dimensional index 3 5 12 13
16 19 23 23 25 29 Sorted array x Query: 13 in x? Binary search, B-Tree, etc std::vector<int> x (sorted) or std::flat_set<int> x in c++23 ➢ std::lower_bound ➢ found

16 19 23 23 25 29 Sorted array x Query: 13 in x? Binary search, B-Tree, etc std::vector<int> x (sorted) or std::flat_set<int> x in c++23 ➢ std::lower_bound ➢ found 𝑂 log 𝑁 . Fast. But we ignore data distribution. Can we improve this with ML?

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 14 Learned one-dimensional index 3 5 12
13 16 19 23 23 25 29 Sorted array x 1 2 3 4 5 6 7 8 9 10 Position Query: 13 in x?

13 16 19 23 23 25 29 Sorted array x 1 2 3 4 5 6 7 8 9 10 Position Position Sorted array x (12, 3) (13, 4) Query: 13 in x?

13 16 19 23 23 25 29 Sorted array x 1 2 3 4 5 6 7 8 9 10 Position Position Sorted array x (12, 3) (13, 4) If we can train a model, pos = f(query), we can guess: 4 ∼ f(13) Query: 13 in x? f

13 16 19 23 23 25 29 Sorted array x 1 2 3 4 5 6 7 8 9 10 Position ➢ Refine the prediction by binary search around it: 13 vs [13, 16, 19] ➢ Ensure the result is correct (no approximation) Query: 13 in x? Learned (one-dimensional) index Train ➢ Train f from (3, 1), (5, 2), … Test ➢ Predict pos = f(query) ➢ Refine pos 13 4.8 Predict 4.8 = f(13) f

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 ➢ Refine the prediction by binary search
around it: 13 vs [13, 16, 19] ➢ Ensure the result is correct (no approximation) 18 Learned one-dimensional index 3 5 12 13 16 19 23 23 25 29 Sorted array x 1 2 3 4 5 6 7 8 9 10 Position Query: 13 in x? Learned (one-dimensional) index Train ➢ Train f from (3, 1), (5, 2), … Test ➢ Predict pos = f(query) ➢ Refine pos 13 4.8 Predict 4.8 = f(13) f Can utilize the data distribution How to choose f? How to train f?

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 19 Learned one-dimensional index The first and
the most basic model: Recursive Model Index (RMI) ➢ Arrange search models hierarchically ➢ Each model can be either a B-Tree or a Neural Network ➢ No deep theoretical analysis Kraska+, “The Case for Learned Index Structures”, SIGMOD 2018

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 Learned one-dimensional index ALEX ➢ RMI is
static (no deletion) ➢ By introducing “gap”, one can update data ➢ It is common to propose a dynamic version after a static structure was proposed. Ding+, “ALEX: An Updatable Adaptive Learned Index”, SIGMOD 2020 20

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 Learned one-dimensional index PGM-index ➢ Recursive, piece-wise
linear model ➢ Strong theoretical background ➢ Practically fast Ferragina and Vinciguerra, “The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds”, VLDB 2020 Very useful slides: https://pgm.di.unipi.it/slides-pgm-index-vldb.pdf. The presentation style of my talk is inspired by this Predict Refine 21

13 16 19 23 23 25 29 Sorted array x Query: 13 in x? Binary search, B-Tree, etc std::flat_map<int, std::string> x / std::flat_map<int, pointer> x ➢ BTW, “Map” can be done in the same way ➢ Often, key->pointer is used “dog” “cat” “horse” Key Value 3 “dog” 5 “cat 12 “horse” …

13 16 19 23 23 25 29 Sorted array x Query: 13 in x? std::flat_map<int, std::string> x / std::flat_map<int, pointer> x ➢ BTW, “Map” can be done in the same way ➢ Often, key->pointer is used “dog” “cat” “horse” Key Value 3 “dog” 5 “cat 12 “horse” …

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 Multi-dimensional index 25 x y A set
of 𝐷-dim points (𝐷 = 2 here)

of 𝐷-dim points (𝐷 = 2 here) Query: ➢ Orthogonal range Task: ➢ Enumerate all points in the range

of 𝐷-dim points (𝐷 = 2 here) boost::geometry::rtree https://www.boost.org/doc/libs/latest/libs/geometry/doc/html/g eometry/spatial_indexes/creation_and_modification.html Query: ➢ Orthogonal range Task: ➢ Enumerate all points in the range

of 𝐷-dim points (𝐷 = 2 here) boost::geometry::rtree https://www.boost.org/doc/libs/latest/libs/geometry/doc/html/g eometry/spatial_indexes/creation_and_modification.html Multi-dim index is still a very challenging problem, especially with a large 𝐷 Can ML boost the performance? Query: ➢ Orthogonal range Task: ➢ Enumerate all points in the range

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 Learned multi-dimensional index 29 Nathan+, “Learning multidimensional
indexes”, SIGMOD 2020 Flood ➢ Index ✓ Partition the space into cells ✓ Sort data in each cell ➢ Test ✓ Binary search + scan ➢ Where is ML? ✓ Select hyperparameters (cell size) by training data

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 Learned multi-dimensional index 30 Hidaka and Matsui,
“FlexFlood: Efficiently Updatable Learned Multi-dimensional Index”, NeurIPS WS 2024 FlexFlood ➢ Data update is challenging ➢ Data update -> Distribution shift -> Slow querying ➢ Propose an effective data update for Flood ✓ Keep fast search after the update ✓ Show the amortized computational cost for the update

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 32 Bloom filter ➢ A probabilistic data
structure to maintain a set approximately ➢ Something like std::set<Item> ➢ Memory efficient, e.g., 10 bit/item ➢ Fast membership querying ➢ Bloom filter is implemented in Boost 1.89+ // 5 hashes per item. 1000000-bit array. boost::bloom::filter<std::string, 5> f(1'000'000); // Similar to std::set<std::string> f.insert(“dog”); f.insert(“cat”); // Always correct assert(f.may_contain(“dog”) == true); if(f.may_contain(“rabbit”)) { // Likely false //... } For the inserted items, may_contain is always correct For the non-inserted items, may_contain may be wrong

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 33 Bloom filter 0 0 0 0
0 0 0 0 0 0 0 0 𝐵-bit array

0 0 0 0 0 0 0 0 Insert “dog” 𝐵-bit array

0 0 0 0 1 0 0 0 Insert “dog” ℎ1 "dog" = 9 𝐵-bit array Hash function ℎ𝑘 : item ↦ 1, … , 𝐵

0 0 0 0 1 0 0 0 Insert “dog” ℎ2 "dog" = 4 ℎ1 "dog" = 9 𝐵-bit array Hash function ℎ𝑘 : item ↦ 1, … , 𝐵

0 0 0 0 1 0 0 0 𝐵-bit array

0 0 0 0 1 0 0 0 Insert “cat” 𝐵-bit array

0 1 0 0 1 0 0 0 ℎ1 "cat" = 6 Insert “cat” 𝐵-bit array

0 1 0 0 1 0 1 0 ℎ1 "cat" = 6 ℎ2 "cat" = 11 Insert “cat” 𝐵-bit array

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 41 0 0 0 1 0 1
0 0 1 0 1 0 𝐵-bit array Bloom filter

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 42 0 0 0 1 0 1
0 0 1 0 1 0 𝐵-bit array Bloom filter Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “dog”? Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “dog”? ℎ2 "dog" = 4 ℎ1 "dog" = 9 ➢ All bits are 1 ➢ Return “yes” Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “rabbit”? Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “rabbit”? ℎ1 "rabbit" = 2 ℎ2 "rabbit" = 11 ➢ Some bits are 0 ➢ Return “no” Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “frog”? Representing “dog” and “cat”

0 1 0 0 1 0 1 0 𝐵-bit array Contain “frog”? ℎ2 "frog" = 9 ℎ1 "frog" = 11 ➢ All bits are 1 ➢ Return “yes” Wrong! Hash collisions. Representing “dog” and “cat”

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 49 Bloom filter Two interpretations for BF’s
characteristic Suppose BF contains items 𝒮. ➢ For a query 𝑥 ∈ 𝒮, BF returns the correct answer (“yes”) ➢ For a query 𝑦 ∉ 𝒮, BF may return the wrong answer (should be “no”, but sometimes “yes”) ➢ If BF returns “no”, it is always correct ➢ If BF returns “yes”, it may be wrong Focusing items Focusing answers Same thing

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 50 Bloom filter Two interpretations for BF’s
characteristic Suppose BF contains items 𝒮. ➢ For a query 𝑥 ∈ 𝒮, BF returns the correct answer (“yes”) ➢ For a query 𝑦 ∉ 𝒮, BF may return the wrong answer (should be “no”, but sometimes “yes”) ➢ If BF returns “no”, it is always correct ➢ If BF returns “yes”, it may be wrong Focusing items Focusing answers Same thing BF is good. But BF ignores the data distribution at all Can we boost the performance of BF with ML?

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 51 Learned bloom filter Kraska+, “The Case
for Learned Index Structures”, SIGMOD 2018 (1) Prepare an ML model that predicts an item is in the set or not (2) Given a query, estimate whether it is in the set or not by the ML model (3) If the ML model’s score is low, use BF ➢ The original Learned Bloom Filter (LBF) ➢ With an ML model, overall performance is better (memory cost vs false positive rates) ➢ Maintain the BF’s characteristics

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 52 Learned bloom filter Learned Bloom Filter
(LBF) [Kraska+, SIGMOD 2018] Sandwiched LBF Mitzenmacher, “A model for learned bloom filters and optimizing by sandwiching”, NeurIPS 2018 Pre-BF is also useful

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 53 Learned bloom filter Learned Bloom Filter
(LBF) [Kraska+, SIGMOD 2018] Partitioned Learned Bloom Filter (PLBF) ➢ Vaidya+, “Partitioned learned bloom filters”, ICLR 2021 ➢ Sato and Matsui, “Fast Partitioned Learned Bloom Filters”, NeurIPS 2023 ➢ Prepare several BFs ➢ Select one based on the model’s score (MoE?)

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 54 Learned bloom filter LBF Sandwiched LBF
PLBF Generalize Cascaded LBF (CLBF) Sato and Matsui, “Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection”, arXiv 2025

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 56 Approximate Nearest Neighbor Search 0.23 3.15
0.65 1.43 Search 0.20 3.25 0.72 1.68 𝒙74 argmin 𝑛∈ 1,2,…,𝑁 𝒒 − 𝒙𝑛 2 2 Result 𝒙1 , 𝒙2 , … , 𝒙𝑁 𝒙𝑛 ∈ ℝ𝐷 ➢ Several approaches: Graph-based, inverted index, quantization, etc ➢ See our tutorials [CVPR2020], [CVPR2023] [CVPR2020] Matsui, Yamaguchi, and Wang, “Image Retrieval in the Wild”, CVPR 2020 Tutorial [CVPR2023] Matsui, Aumüller, and Xiao, “Neural Search in Action ”, CVPR 2023 Tutorial

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 57 Approximate Nearest Neighbor Search Vector quantizer
𝒄1 𝒄3 𝒄2 𝒄4 𝒄5 𝒄6 𝒄7 1.02 0.73 0.56 1.37 1.37 0.72 𝒙 Although rarely mentioned, many ANN technologies are considered learned data structures Interaction between the ANN and Learned Data Structures communities is beneficial 𝑞 𝒙 = argmin 𝑘∈ 1,2,…,𝐾 𝒙 − 𝒄𝑘 2 2 Hyperparameters, 𝒄𝑘 𝑘=1 𝐾 , are trained with training data Douze, “Machine learning and high dimensional vector search”, arXiv 2025 In their seminal work The case for learned indexing structures [32], Kraska et al. introduced a series of ML based algorithms to speed up classical indexing structures like hash tables and B- trees. … However, 7 years later, it is striking that ML has had very little impact on VS. No ML based method appears in the current VS benchmarks at any scale [2, 47, 46]. The most advanced ML tool that is widely used for VS is the k-means clustering algorithm. There is no deep learning, no use of high-capacity networks. Matthijs Douze An original inventor of faiss

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 59 How to incorporate ML into data
structures? Each approach is proposed independently A unified theory, taxonomy, or methodology is appreciated Classical data structure and algorithm 𝑥 𝑓 𝑥; 𝜃 𝑥 𝑓 𝑥; 𝜃 ML model 𝑥 𝑓 𝑥; 𝜃 ML model 𝑥 𝑓 𝑥; 𝜃 ML model Approach 1: Placing an ML model before or after 𝑓 Approach 2: Replacing part or all of 𝑓 with an ML model Approach 3: Obtain a hyperparameter 𝜃 using ML. e.g., bloom filter e.g., one-dim index e.g., multi-dim index, ANN Input Hyperparamter

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 60 Which data structures should we target?
The target data structures are selected ad hoc? What data structures should ML truly be applied to? Can we consider high-dimensional data more? Currently, only ANN. https://en.wikipedia.org/wiki/List_of_data_structures

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 61 Applications to computer vision Better to
assume a specific application domain, e.g., CV? Müller+, “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding”, SIGGRAPH 2022 ➢ Efficient NeRF approaches are proposed with new data structures ➢ Relation to Learned Data Structures?

assume a specific application domain, e.g., CV? Qin+, “LangSplat: 3D Language Gaussian Splatting”, CVPR 2024 ➢ Language field = 3D index + High-dim data? ➢ Can we leverage learned data structures to accelerate processing?

assume a specific application domain, e.g., CV? Qin+, “LangSplat: 3D Language Gaussian Splatting”, CVPR 2024 ➢ Language field = 3D index + High-dim data? ➢ Can we leverage learned data structures to accelerate processing? CV Learned Data Structure Import ➢ Learned Data Structure techniques into CV Export ➢ Designing Learned Data Structures suited to CV problems

http://bit.ly/43wois0 Slides: http://bit.ly/43wois0 64 PhD positions available! ➢ Our Lab
is recruiting PhD students for admission in or after April 2027! (current M1+ students) ➢ Research areas: computer vision, databases, machine learning, etc. ✓ CV (Learned) Data Structure ➢ Desired qualities ✓ Passion for research (most important) ✓ Experience with publications in top conferences or journals is a plus ✓ Those who wish to apply must first consult with their advisor and then contact me.

When Learned Data Structures Meet Computer Vision

When Learned Data Structures Meet Computer Vision

More Decks by Yusuke Matsui

Other Decks in Research

Featured

Transcript