Binary Embeddings For Efficient Ranking

Binary Latent Representations for Efficient Ranking Maciej Kula Ravelin

Item set size a challenge in LSRS When you have
10s or 100s of millions of items in your system, embeddings are challenging to • estimate • store • and compute predictions with. True both in offline and online systems, but problem especially severe online.

Online settings increasingly important Need online predictions for • contextual
models • incorporating new interactions

Online settings have hard constraints In online settings, have ~100ms
to • update models • retrieve candidates • perform scoring. Can you carry out 100 million dot products in under 100ms? Still need to fit in business logic, network latency and so on.

Solutions • Heuristics • ANN search • More compact representations.
Less storage and computation required for smaller embedding dimensions, at expense of accuracy.

Use binary representations instead

Binary dot product Scaled XNOR as the binary analogue of
a dot product: Successfully used for binary CNNs.

Benefits Space • Real-valued representations require 4 bytes per dimension
• 32 binary dimensions in 4 bytes Speed • Two floating point operations per dimension • XNOR all 32 dimensions in two clock cycles

Does this offer a better tradeoff than simply reducing the
latent dimensionality?

Experiments Set up: • Standard learning-to-rank matrix factorization model •
Evaluated on MovieLens 1M Key metrics: • MRR • Predictions per millisecond

Models Baseline • 2 embedding layers, for users and items
• Negative sampling • BPR loss with tied weights • Adaptive hinge loss with tied weights Binary model: • Embeddings followed by a sign function • Trained by backpropagation

Backpropagation Sign function is not differentiable. Use a continuous version:

Backpropagation Normal forward pass. In the backward pass, gradients are
applied to the real-valued embedding layers. We can discard those once the model has been estimated.

Predictions Implemented in C. The baseline is a standard dot
product using SIMD intrinsics.

Aside: SIMD

XNOR • 8-float wide XOR • 8-float wide negation •
count one bits • scaling

Results

Bottom line Moving from the 1024 to 32 dimensions in
the continuous model implies a 29 times increase in prediction speed at the expense of a modest 4% decrease in accuracy Moving from a float representation to a 1024-dimensional binary representation implies a sharper accuracy drop at 6% in exchange for a smaller 20 times increase in prediction speed.

More promising approaches • Maximum inner product search • Bloom
embeddings!

Thanks! @Maciej_Kula Source code: github.com/maciejkula/binge

Binary Embeddings For Efficient Ranking

Binary Embeddings For Efficient Ranking

Maciej Kula

More Decks by Maciej Kula

Other Decks in Research

Featured

Transcript

Binary Latent Representations for Efficient Ranking Maciej Kula Ravelin

Item set size a challenge in LSRS When you have

Online settings increasingly important Need online predictions for • contextual

Online settings have hard constraints In online settings, have ~100ms

Solutions • Heuristics • ANN search • More compact representations.

Use binary representations instead

Binary dot product Scaled XNOR as the binary analogue of

Benefits Space • Real-valued representations require 4 bytes per dimension

Does this offer a better tradeoff than simply reducing the

Experiments Set up: • Standard learning-to-rank matrix factorization model •

Models Baseline • 2 embedding layers, for users and items

Backpropagation Sign function is not differentiable. Use a continuous version:

Backpropagation Normal forward pass. In the backward pass, gradients are

Predictions Implemented in C. The baseline is a standard dot

Aside: SIMD

XNOR • 8-float wide XOR • 8-float wide negation •

Results

Results

Bottom line Moving from the 1024 to 32 dimensions in

More promising approaches • Maximum inner product search • Bloom

Thanks! @Maciej_Kula Source code: github.com/maciejkula/binge