The Inner Workings of Monzo’s Help Search Algorithm

Monzo’s Customer Service Augmentation Tech Sessions Nigel Ng Data Scientist

At Monzo, we want to provide world class customer service

But customer service enquiries scales linearly with number of users

How can we meet our goals of having response times
< 10 minutes?

Reduce number of inbound questions Increase productivity of customer support
agents

Here’s how we used ML to tackle those problems

Reducing queries: Enable natural language search for help content

Increasing productivity: Suggest saved responses to a support agent

We have ~800 of these saved responses. Approximately 70-80% of
queries that come in can be handled by a saved response.

We achieved those two goals with a single NLP model

Formulate this as an information retrieval problem:

Given an incoming question, find the most similar answer from
an answer pool

We define ‘most similar’ as the pairing (q, a) that
yields the highest cosine similarity

To find cosine similarity of texts, we need to represent
them as vectors Encoder I forgot my PIN I can reset it for you! [0.2, -3.0, …, 1.8, 0.3] [0.3, -2.2, …, 0.9, 0.4]

Word2vec works well on our data

Encoder So now we need to go from word vectors
to paragraph vectors I [0.2, 0.4, …, -0.2] lost [-0.1, 0.1, …, 1.1] my [0.5, 1.2, …, -2.8] pin [-1.4, -0.1, …, -0.5] [-1.1, -0.4, …, 0.8] [0.9, -3.0, …, 1.8, 0.3]

Our journey to build such an encoder Just take average/max
of the word vectors Paragraph vectors (Mikolov et al., 2015) - I had such high hopes ʭ Paragraph vectors in combination with textual search (BM25, tfidf) Supervised pre-training and removing the last classification layer (Deng et al., 2009) - doesn’t generalize well to less common phrases Vanilla word-level RNN - looks good but routinely fails on longer sentences

Our journey to build such an encoder Hierarchical Attention Networks
(Yang et al., 2016) - nice! We used this in production for a while. Transformer (Google, 2017) - What we’re currently using.

Training the Transformer Model

Before using transformer, our models were written in Keras +
TF

For the transformer model, we used Pytorch

We modify the architecture by only taking the encoder

~400k parameters (small model)

Customer service conversations are unstructured text data

We train with triplets... How do I change my PIN?
You can change your card PIN at any large bank (HSBC, Barclays, etc.) ATM in the UK by selecting PIN services ʠ You'll be pleased to know that we never charge you any fees for withdrawing money from an ATM ȓ Q A+ A- IBM, Feng et al. 2015

… using a ranking loss / hinge loss objective function
m is a margin, we set it to 0.2 per the IBM paper (Feng et al., 2015) If cos(Q, A+) is large, loss is 0, which is what we want

Train on GCE instance with 1xP100 GPU for 2-3 days

A few ‘tricks’ while training: Replace too-hard or too-easy samples
every few epochs of training (Google Facenet, 2015) with semi-hard examples Use the same weights for both questions and answers Learning rate annealing (we found simple reduce-when-plateau to work better than fancy methods like SGDR)

Serving the model

For customer support, model runs on a GCE instance every
minute (CPU only) Pushes to Intercom via API

For help search, serve as a Flask microservice in a
Docker container on Kubernetes

In both cases, we’re using the Pytorch model directly. No
exporting to Caffe2

(ms) It works well for our volumes x3 pods on
k8s

Raw TF protobuf on the same Docker specs crashes the
microservice

Unit tests to check prediction quality before pushing a new
model live

Results

Our saved response usage went up...

… which gave us actionable clustering of support queries All
fixed in latest version of the app Reduced queries by 10% (~800 queries a week)

We also use saved response predictions as classification

Users find what they need XX % of the time
(recall @ 5)

Search also allows further insights into users problems

Other learnings

Do not train ranking problems as a classification problem How
do I change my PIN? You can change your card PIN at any large bank (HSBC, Barclays, etc.) ATM in the UK by selecting PIN services ʠ You'll be pleased to know that we never charge you any fees for withdrawing money from an ATM ȓ How do I change my PIN? 1 0

Transformer does 20% better than HAN with same number of
parameters http://smerity.com/articles/2017/mixture_of_softmaxes.html

Keras is a great tool for rapid prototyping, but impossible
to implement more complex models

Despite being a dynamic graph, Pytorch performs better than TF
in both training and serving

… and the size of the binary is smaller (70mb
vs 110mb)

Only scenario in which TF is faster than Pytorch is
CPU training

Use right level of abstraction Classification Ranking Text generation Lower
abstraction / more complex model

Fine-tuning on short term anomalies doesn’t work - catastrophic forgetting
Will try to implement Elastic Weight Consolidation (EWC) if I have time

Use Pytorch! It’s much more fun to code in with
tf.gfile.FastGFile(graph_path, 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) with tf.Graph().as_default() as graph: tf.import_graph_def(graph_def, name='import') torch.load(checkpoint_files[ 0], map_location=lambda storage, loc: storage) Loading model (actual code in repo) TF Pytorch

Key takeaways

Build multi-purpose models to avoid technical debt Monitor metrics and
actual usage habits Give Pytorch a go!

Thank you

The Inner Workings of Monzo’s Help Search Algor...

The Inner Workings of Monzo’s Help Search Algorithm

More Decks by techsessions

Other Decks in Technology

Featured

Transcript