Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Binary Embeddings For Efficient Ranking

Maciej Kula
August 31, 2017

Binary Embeddings For Efficient Ranking

Talk given during the Large Scale Recommender System workshop at the ACM RecSys 2017 conference.

Maciej Kula

August 31, 2017
Tweet

More Decks by Maciej Kula

Other Decks in Research

Transcript

  1. Item set size a challenge in LSRS When you have

    10s or 100s of millions of items in your system, embeddings are challenging to • estimate • store • and compute predictions with. True both in offline and online systems, but problem especially severe online.
  2. Online settings have hard constraints In online settings, have ~100ms

    to • update models • retrieve candidates • perform scoring. Can you carry out 100 million dot products in under 100ms? Still need to fit in business logic, network latency and so on.
  3. Solutions • Heuristics • ANN search • More compact representations.

    Less storage and computation required for smaller embedding dimensions, at expense of accuracy.
  4. Binary dot product Scaled XNOR as the binary analogue of

    a dot product: Successfully used for binary CNNs.
  5. Benefits Space • Real-valued representations require 4 bytes per dimension

    • 32 binary dimensions in 4 bytes Speed • Two floating point operations per dimension • XNOR all 32 dimensions in two clock cycles
  6. Experiments Set up: • Standard learning-to-rank matrix factorization model •

    Evaluated on MovieLens 1M Key metrics: • MRR • Predictions per millisecond
  7. Models Baseline • 2 embedding layers, for users and items

    • Negative sampling • BPR loss with tied weights • Adaptive hinge loss with tied weights Binary model: • Embeddings followed by a sign function • Trained by backpropagation
  8. Backpropagation Normal forward pass. In the backward pass, gradients are

    applied to the real-valued embedding layers. We can discard those once the model has been estimated.
  9. Bottom line Moving from the 1024 to 32 dimensions in

    the continuous model implies a 29 times increase in prediction speed at the expense of a modest 4% decrease in accuracy Moving from a float representation to a 1024-dimensional binary representation implies a sharper accuracy drop at 6% in exchange for a smaller 20 times increase in prediction speed.