Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The last mile problem in ML

The last mile problem in ML

Draft slides for presentation

Krishna Sangeeth

April 10, 2019
Tweet

Other Decks in Technology

Transcript

  1. What I wish to talk about today - Some of

    the challenges i had faced in taking ML models to production. - Ideas on tackling them.
  2. ELI5 ML SE Input + Business Logic/ Rules ⇒ Output

    f(Input) ⇒ output ML Given Input and output, can we find a **good** f that would generalize ?
  3. Usecase - An ensemble of 50+ models that predicts a

    product category (7000+ classes). - Models need to be retrained quite often. - Should automate as much as possible.
  4. MY WISHLIST - Remember what data was used. - Remember

    what code was used. - Remember what hyperparameters / config was used. - Remember what the results of experiments are. - Remember to save the model somewhere maybe? - Compare the results between different experiments.
  5. MY WISHLIST - Should be able to scale up as

    my training data increase. - Should handle all my training dependencies well. - Should be executed remotely - Should be cost effective. - Should be able to train multiple models in parallel. - Should be framework agnostic. - Should track everything about experiment - Should allow optional deployment of the model built.
  6. mlproject/ ├── mlproject │ ├── cli │ ├── estimators │

    ├── preprocessing │ ├── resources │ ├── tests │ ├── training │ ├── utils | └── service ├── ops ├── notebooks └── scripts
  7. Why we need this - ML in production is model

    + heuristics - It’s most of the time an ensemble of models. - Important to ensure that the glue code is bug free. - Enables a CI/CD based deployment of models. - Remember good SE practices apply in ML as well. - The build of this codebase should be an artifact from the CI/CD pipeline.
  8. Inference - Realtime via APIs - Batch predition using PySpark

    - BG deployment based on a validation set.
  9. TAKEAWAYS - Important to have your ML code structured properly

    from the get go. - Important to have artifacts of different experiments saved properly. - Important to have a cost effective yet scalable way of training models. - Write good code.