Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Engineering principles with Python and MLFlow by Natu Lauchande

Pycon ZA
October 10, 2019

Machine Learning Engineering principles with Python and MLFlow by Natu Lauchande

Machine Learning is a very hyped topic of the moment. While a lot of the talks and presentations cover the data science component, very few cover the nity gritty details of a machine learning pipeline. This talk will focus on the engineering part of Machine Learning by covering different Machine Learning systems architecture best practices, strategies including design. We will delve into the essence of Uber's Michelangelo, Airbnbs s Bighead and Facebooks FB Learner. During the talk, I will use MLFlow and Python as platforms to create an open-source based solution similar to the ones from the big tech companies for the everyday tech startup. The entirety of the cycle of training, deployment, monitoring, champion/challenger testing, and serving layer will be addressed. Technical debt prevention is another topic that will be addressed in the end of the talk.

Pycon ZA

October 10, 2019
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Machine Learning Engineering Principles with Python and MLFLow (PyconZA-2019) Natu

    Lauchande (Principal Data Engineer @ Jumo.World) 1
  2. AGENDA In Scope: 1. ML Engineering pain points 2. Sample

    of relevant ML Pipeline architectures used across industries 3. ML Engineering principles to address some of the pain points in the ML development process 4. Examples of usage of MLFlow and other tools to implement principles Out of Scope: 1. Detailed review of ML systems 2. Any specific data science or algorithmic solution 4. Tutorial on using any specific tool ( eg: MLFlow, etc.) 2
  3. Context : Machine Learning easily accessible

  4. Pain points - Issues getting models from dev to production

  5. Pain points - Data Pipeline Entanglement

  6. Pain Points - Experimentation Management

  7. Machine Learning Engineering Pain - Points - Metrics

  8. How to crack this issues?

  9. ML Engineering

  10. Idealized Workflow of a Machine Learning Project

  11. ML Engineering

  12. ML code in the Context of ML backed System

  13. Relevant Machine Learning Engineering Principles • Software engineering approach, platform

    view of your ML functioning - ( CD4ML, Testing, MLOps) • Integrated reliable data lifecycle • Reproducible and reliable model management • Experimentation and rapid prototyping facilities
  14. Uber Michelangelo

  15. Uber Michelangelo PyML

  16. Uber Michelangelo Feature Store

  17. Facebook FBLearner

  18. Airbnb Bighead

  19. Machine Learning Platforms Comparative View

  20. DIY ML Engineering Pipeline

  21. Ideal Pipeline - Real Time Monitoring Perspective ( D. Sculley

    - https://www.youtube.com/watch?v=V18AsBIHlWs) ML Score - https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45742.pdf) 21
  22. Data Lifecycle 22 Examples of data versioning tools: 1. Apache

    Delta Lake 2. DVC
  23. Data Lifecycle Tooling 23 Examples of tools : 1. Deequ

    2. Great Expectations
  24. Data Lifecycle Tooling 24 Feature Management: By gojek/feast and google

    cloud (still WIP): • Support ingesting feature data via batch or streaming • Low latency API • Enable discovery and documentation of features • Provide an overview of the general health of features in the system
  25. 25 Open source machine learning platform that addresses the following

    issues : 1. Experimentation Management : UI to manage and log different models 2. Reproducibility : Runs he same way anywhere 3. Standardization of ML Projects: Designed to scale for 1 or 100 000 orgs 4. Standardization of deployment operations into multiple runtimes ( AWS Sagemaker, Azure ML, Google Cloud, Kubernetes etc.)
  26. 26

  27. 27

  28. Machine Learning Principes : Databricks Blue Print for an Idealized

    Pipeline and the role of MLFlow 28
  29. 29

  30. 30 Demo Example I - Bitpred - Data https://github.com/nlauchande/bitpred

  31. 31 ML Problem - Bitpred - Problem Toy Problem Statement

    : Will the bitcoin move up or down in the next 24 hours ? - https://github.com/nlauchande/bitpred
  32. 32 ML Problem - Bitpred - Project Structure - MLProject

    file
  33. 33 ML Problem - Bitpred - Project Structure - Conda

    File
  34. 34 ML Problem - Bitpred - train.py/ w/ experimentation log

  35. 35 ML Problem - Bitpred - Running your training pipeline

    in a reproducible manner
  36. 36 ML Problem - Demo ML Flow UI

  37. 37 ML Problem - Demo ML Flow UI

  38. 38 ML Problem - MLFlow Immediate Access to a Serving

    Layer ( Batch, Rest API ) Running the prediction service : Calling the rest API
  39. 39 Predictions Monitoring

  40. 40 Predictions Monitoring Dashboard Example

  41. 41 Demo Example II - ML Pipeline for a PTSD

    Chatbot risk classifier
  42. 42 Demo Example - Sample sklearn pipeline

  43. 43 Demo Example - Log your model

  44. 44 Demo Example - Track Experiments

  45. 45 Demo Example - Model Tracking and Versioning

  46. 46 Demo Example - Model Tracking and Versioning

  47. 47 Demo Example - Serving Layer

  48. 48 Demo Example - Serving Layer

  49. 49 Demo Example - ML Package manager

  50. Summary / Lessons learned • It’s painful to move from

    development to production • ML ecosystem is very recent and tools and frameworks are still maturing • Devops and standard software engineering practices are extremely useful • Cross functional working on ML projects at the same time 50
  51. 51 Wrap UP & QA