SSE: Stable Static Embedding

SSE(Stable Static Embedding): Unlocking the Potential of Static Embeddings, A
Dynamic Tanh Normalization Approach without Speed Penalty

Local AI LT Conference Presented at

◆Hobby Making sweets, Tea, Listening to classical music, Clothes ◆Recent
Activities Silver Award: Liquid AI Hackathon Series | Tokyo Article writing (related to Mamba, LFM2 (LTCs)) About us Independent researcher (machine learning / algebra / mathematical logic) Rikka Botan X(Twitter) Portfolio

The author acknowledge the support of Saldra, Witness and Lumina
Logic Minds for providing computational resources used in this work. Firstly

Contents 1 Introduction 2 Method 3 Evaluations 4 Application

◆The Importance of Fast Search RAG (Retrieval-Augmented Generation), recommendation systems,
internal document searches In these systems, it is necessary to quickly retrieve relevant information from millions to tens of billions of documents. Balancing response speed and search accuracy greatly impacts the user experience. Many systems use a configuration called Retrieval + Reranking. Introduction ◆Related Studies Year Paper / Model Author Feature 2013 Word2Vec Tomas Mikolov et. al. A method for embedding words into low-dimensional vectors using Skip-gram / CBOW. One of the earliest static embeddings using word co-occurrence. 2014 GloVe Jeffrey Pennington et. al. Static embeddings using statistical information from word co-occurrence matrices. A representative method alongside Word2Vec. 2019 Sentence-BERT Nils Reimers et. al Generate sentence embeddings using a Siamese BERT architecture. (Learns similarity between sentence vectors) High quality but computationally expensive. 2024 Model2Vec MinishLab A method for distilling Sentence Transformers to create a compact static embedding model. 2025 Static Retrieval MRL Tom Aarsen Fast Static Sentence Embedding with Averaged Token Embeddings. Matryoshka Loss & Contrastive Learning. 100–400× Faster on CPU.

Research perspective Question The architecture of the static embedding model
has not changed from Word2Vec. Only the learning methods have been improved. Challenge The architecture is simple, making it difficult to implement improvements without sacrificing speed (and adopting highly expressive operations is also challenging). Idea Control of the representation space through interaction with the learning mechanism and process, rather than through individual modules.

◆Introduction of Separable DyT (Dynamic Tanh normalization) and Construction of
SSE (Stable Static Embedding) Method ◆Gradient Control and Improved Generalization of Representation Space by Separable DyT ▪Architecture ▪Algorithm 𝑦𝑘 = 𝑐𝑘 tanh 𝑎𝑘 𝑥𝑘 + 𝑏𝑘 𝜕𝑦𝑘 𝜕𝑥𝑘 = 𝑐𝑘 𝑎𝑘 sech2 𝑎𝑘 𝑥𝑘 + 𝑏𝑘 Maintenance of unsaturated dimensions ൗ 𝜕yi 𝜕xi → 0 ( ai xi + bi ≫ 1) ൗ 𝜕yi 𝜕xi ≈ ci ai ( ai xi + bi < 1) Decay of the saturated dimension Learning signals with high noise are attenuated. Learning signals with stable information are maintained. Without explicit hyperparameters, implicit regularization enhances the generalization performance of the representation space.

Evaluations Comparison of (a) Loss and (b) Gradient Norm Across
Training Steps. ➢ Maintain the gradient even in the later stages of training, and continue updating parameters

Evaluations NanoBEIR mean nDCG@10 Across Training Steps. NanoBEIR English mean
nDCG@10 vs Matryoshka Embedding Truncation. ➢ Consistently surpasses the baseline in the latter half of the learning process and at large embedding dimensions.

Evaluations (a) Retrieval performance (nDCG@10) across NanoBEIR English tasks. (b)
Mean nDCG@10 vs. inference speed (QPS: queries per second) measured on TREC-COVID and Quora using an Intel® Core Ultra 7 265K (3.90 GHz) with batch size 32. ➢ In the English document retrieval task, we have reached the frontier of speed and accuracy.

Evaluations (a) Retrieval performance (nDCG@10) across NanoBEIR Japanese tasks. (b)
Mean nDCG@10 vs. inference speed (QPS: queries per second) measured on Miracl using an Intel® Core Ultra 7 265K (3.90 GHz) with batch size 32. ➢ In the Japanese document retrieval task, we have also reached the frontier of speed and accuracy.

Evaluations PCA Spectrum on the 13 NanoBEIR English Datasets: Normalized
Eigenvalue Decay (a) Linear Scale, (b) Logarithmic Scale. In SSE, the decay of eigenvalues was observed at smaller dimensional sizes. ➢ By suppressing noise, low-rank regularization (concentration of information into a compact subspace) is implicitly achieved.

Application It is possible to score tens of thousands of
documents within one second using only a general CPU. Combined with web search (such as DuckDuckGo), lightweight reference searches can also be realized.

Reference of SSE (Stable Static Embedding) ◆SSE Collection

SSE: Stable Static Embedding

SSE: Stable Static Embedding

Rikka Botan

More Decks by Rikka Botan

Other Decks in Research

Featured

Transcript