Slide 1

Slide 1 text

The Last MILE problem in ML @whiletruelearn

Slide 2

Slide 2 text

What I wish to talk about today - Some of the challenges i had faced in taking ML models to production. - Ideas on tackling them.

Slide 3

Slide 3 text

ELI5 ML SE Input + Business Logic/ Rules ⇒ Output f(Input) ⇒ output ML Given Input and output, can we find a **good** f that would generalize ?

Slide 4

Slide 4 text

Usecase - An ensemble of 50+ models that predicts a product category (7000+ classes). - Models need to be retrained quite often. - Should automate as much as possible.

Slide 5

Slide 5 text

I have built a model now , what next ?

Slide 6

Slide 6 text

“ What is the cornerstone of Science ? Repeatable results “

Slide 7

Slide 7 text

THERE IS `SCIENCE` IN DATA SCIENCE

Slide 8

Slide 8 text

IT is very important to TRACK YOUR EXPERIMENTS

Slide 9

Slide 9 text

ZOMBIE MODELS

Slide 10

Slide 10 text

MY WISHLIST - Remember what data was used. - Remember what code was used. - Remember what hyperparameters / config was used. - Remember what the results of experiments are. - Remember to save the model somewhere maybe? - Compare the results between different experiments.

Slide 11

Slide 11 text

Do all this and more in a non intrusive manner.

Slide 12

Slide 12 text

Pip install mlflow

Slide 13

Slide 13 text

DEMO

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Want a SCALABLE SoLUTION for MODEL TRAINING

Slide 16

Slide 16 text

MY WISHLIST - Should be able to scale up as my training data increase. - Should handle all my training dependencies well. - Should be executed remotely - Should be cost effective. - Should be able to train multiple models in parallel. - Should be framework agnostic. - Should track everything about experiment - Should allow optional deployment of the model built.

Slide 17

Slide 17 text

Pip install sagemaker

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

https://github.com/whiletruelearn/amazon-sage-maker-fasttext

Slide 20

Slide 20 text

CODE STRUCTURE & CODE QUALITY

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

mlproject/ ├── mlproject │ ├── cli │ ├── estimators │ ├── preprocessing │ ├── resources │ ├── tests │ ├── training │ ├── utils | └── service ├── ops ├── notebooks └── scripts

Slide 23

Slide 23 text

Why we need this - ML in production is model + heuristics - It’s most of the time an ensemble of models. - Important to ensure that the glue code is bug free. - Enables a CI/CD based deployment of models. - Remember good SE practices apply in ML as well. - The build of this codebase should be an artifact from the CI/CD pipeline.

Slide 24

Slide 24 text

Inference - Realtime via APIs - Batch predition using PySpark - BG deployment based on a validation set.

Slide 25

Slide 25 text

alternatives - Kubeflow - Polyaxon - Alchemist

Slide 26

Slide 26 text

Monitor results, REPEAT , REPEAT WITH MORE DATA.

Slide 27

Slide 27 text

TAKEAWAYS - Important to have your ML code structured properly from the get go. - Important to have artifacts of different experiments saved properly. - Important to have a cost effective yet scalable way of training models. - Write good code.