Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Inner Workings of Monzo’s Help Search Algor...

techsessions
December 07, 2017

The Inner Workings of Monzo’s Help Search Algorithm

Takeaways:
– Going beyond classification for a more general purpose/flexible model;
– How we handle short term discrepancies (i.e. if there’s an outage or other anomalies in queries);
– Our experience switching from Tensorflow to Pytorch;
– Some findings on Pytorch vs Keras.

techsessions

December 07, 2017
Tweet

More Decks by techsessions

Other Decks in Technology

Transcript

  1. We have ~800 of these saved responses. Approximately 70-80% of

    queries that come in can be handled by a saved response.
  2. We define ‘most similar’ as the pairing (q, a) that

    yields the highest cosine similarity
  3. To find cosine similarity of texts, we need to represent

    them as vectors Encoder I forgot my PIN I can reset it for you! [0.2, -3.0, …, 1.8, 0.3] [0.3, -2.2, …, 0.9, 0.4]
  4. Encoder So now we need to go from word vectors

    to paragraph vectors I [0.2, 0.4, …, -0.2] lost [-0.1, 0.1, …, 1.1] my [0.5, 1.2, …, -2.8] pin [-1.4, -0.1, …, -0.5] [-1.1, -0.4, …, 0.8] [0.9, -3.0, …, 1.8, 0.3]
  5. Our journey to build such an encoder Just take average/max

    of the word vectors Paragraph vectors (Mikolov et al., 2015) - I had such high hopes ʭ Paragraph vectors in combination with textual search (BM25, tfidf) Supervised pre-training and removing the last classification layer (Deng et al., 2009) - doesn’t generalize well to less common phrases Vanilla word-level RNN - looks good but routinely fails on longer sentences
  6. Our journey to build such an encoder Hierarchical Attention Networks

    (Yang et al., 2016) - nice! We used this in production for a while. Transformer (Google, 2017) - What we’re currently using.
  7. We train with triplets... How do I change my PIN?

    You can change your card PIN at any large bank (HSBC, Barclays, etc.) ATM in the UK by selecting PIN services ʠ You'll be pleased to know that we never charge you any fees for withdrawing money from an ATM ȓ Q A+ A- IBM, Feng et al. 2015
  8. … using a ranking loss / hinge loss objective function

    m is a margin, we set it to 0.2 per the IBM paper (Feng et al., 2015) If cos(Q, A+) is large, loss is 0, which is what we want
  9. A few ‘tricks’ while training: Replace too-hard or too-easy samples

    every few epochs of training (Google Facenet, 2015) with semi-hard examples Use the same weights for both questions and answers Learning rate annealing (we found simple reduce-when-plateau to work better than fancy methods like SGDR)
  10. For customer support, model runs on a GCE instance every

    minute (CPU only) Pushes to Intercom via API
  11. … which gave us actionable clustering of support queries All

    fixed in latest version of the app Reduced queries by 10% (~800 queries a week)
  12. Do not train ranking problems as a classification problem How

    do I change my PIN? You can change your card PIN at any large bank (HSBC, Barclays, etc.) ATM in the UK by selecting PIN services ʠ You'll be pleased to know that we never charge you any fees for withdrawing money from an ATM ȓ How do I change my PIN? 1 0
  13. Transformer does 20% better than HAN with same number of

    parameters http://smerity.com/articles/2017/mixture_of_softmaxes.html
  14. Fine-tuning on short term anomalies doesn’t work - catastrophic forgetting

    Will try to implement Elastic Weight Consolidation (EWC) if I have time
  15. Use Pytorch! It’s much more fun to code in with

    tf.gfile.FastGFile(graph_path, 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) with tf.Graph().as_default() as graph: tf.import_graph_def(graph_def, name='import') torch.load(checkpoint_files[ 0], map_location=lambda storage, loc: storage) Loading model (actual code in repo) TF Pytorch