The last mile problem in ML

The Last MILE problem in ML @whiletruelearn

What I wish to talk about today - Some of
the challenges i had faced in taking ML models to production. - Ideas on tackling them.

ELI5 ML SE Input + Business Logic/ Rules ⇒ Output
f(Input) ⇒ output ML Given Input and output, can we find a **good** f that would generalize ?

Usecase - An ensemble of 50+ models that predicts a
product category (7000+ classes). - Models need to be retrained quite often. - Should automate as much as possible.

I have built a model now , what next ?

“ What is the cornerstone of Science ? Repeatable results
“

THERE IS `SCIENCE` IN DATA SCIENCE

IT is very important to TRACK YOUR EXPERIMENTS

ZOMBIE MODELS

MY WISHLIST - Remember what data was used. - Remember
what code was used. - Remember what hyperparameters / config was used. - Remember what the results of experiments are. - Remember to save the model somewhere maybe? - Compare the results between different experiments.

Do all this and more in a non intrusive manner.

Pip install mlflow

Want a SCALABLE SoLUTION for MODEL TRAINING

MY WISHLIST - Should be able to scale up as
my training data increase. - Should handle all my training dependencies well. - Should be executed remotely - Should be cost effective. - Should be able to train multiple models in parallel. - Should be framework agnostic. - Should track everything about experiment - Should allow optional deployment of the model built.

Pip install sagemaker

https://github.com/whiletruelearn/amazon-sage-maker-fasttext

CODE STRUCTURE & CODE QUALITY

mlproject/ ├── mlproject │ ├── cli │ ├── estimators │
├── preprocessing │ ├── resources │ ├── tests │ ├── training │ ├── utils | └── service ├── ops ├── notebooks └── scripts

Why we need this - ML in production is model
+ heuristics - It’s most of the time an ensemble of models. - Important to ensure that the glue code is bug free. - Enables a CI/CD based deployment of models. - Remember good SE practices apply in ML as well. - The build of this codebase should be an artifact from the CI/CD pipeline.

Inference - Realtime via APIs - Batch predition using PySpark
- BG deployment based on a validation set.

alternatives - Kubeflow - Polyaxon - Alchemist

Monitor results, REPEAT , REPEAT WITH MORE DATA.

TAKEAWAYS - Important to have your ML code structured properly
from the get go. - Important to have artifacts of different experiments saved properly. - Important to have a cost effective yet scalable way of training models. - Write good code.

The last mile problem in ML

The last mile problem in ML

Krishna Sangeeth

Other Decks in Technology

Featured

Transcript