Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Engineering principles with Py...

Pycon ZA
October 10, 2019

Machine Learning Engineering principles with Python and MLFlow by Natu Lauchande

Machine Learning is a very hyped topic of the moment. While a lot of the talks and presentations cover the data science component, very few cover the nity gritty details of a machine learning pipeline. This talk will focus on the engineering part of Machine Learning by covering different Machine Learning systems architecture best practices, strategies including design. We will delve into the essence of Uber's Michelangelo, Airbnbs s Bighead and Facebooks FB Learner. During the talk, I will use MLFlow and Python as platforms to create an open-source based solution similar to the ones from the big tech companies for the everyday tech startup. The entirety of the cycle of training, deployment, monitoring, champion/challenger testing, and serving layer will be addressed. Technical debt prevention is another topic that will be addressed in the end of the talk.

Pycon ZA

October 10, 2019
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. AGENDA In Scope: 1. ML Engineering pain points 2. Sample

    of relevant ML Pipeline architectures used across industries 3. ML Engineering principles to address some of the pain points in the ML development process 4. Examples of usage of MLFlow and other tools to implement principles Out of Scope: 1. Detailed review of ML systems 2. Any specific data science or algorithmic solution 4. Tutorial on using any specific tool ( eg: MLFlow, etc.) 2
  2. Relevant Machine Learning Engineering Principles • Software engineering approach, platform

    view of your ML functioning - ( CD4ML, Testing, MLOps) • Integrated reliable data lifecycle • Reproducible and reliable model management • Experimentation and rapid prototyping facilities
  3. Ideal Pipeline - Real Time Monitoring Perspective ( D. Sculley

    - https://www.youtube.com/watch?v=V18AsBIHlWs) ML Score - https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45742.pdf) 21
  4. Data Lifecycle Tooling 24 Feature Management: By gojek/feast and google

    cloud (still WIP): • Support ingesting feature data via batch or streaming • Low latency API • Enable discovery and documentation of features • Provide an overview of the general health of features in the system
  5. 25 Open source machine learning platform that addresses the following

    issues : 1. Experimentation Management : UI to manage and log different models 2. Reproducibility : Runs he same way anywhere 3. Standardization of ML Projects: Designed to scale for 1 or 100 000 orgs 4. Standardization of deployment operations into multiple runtimes ( AWS Sagemaker, Azure ML, Google Cloud, Kubernetes etc.)
  6. 26

  7. 27

  8. 29

  9. 31 ML Problem - Bitpred - Problem Toy Problem Statement

    : Will the bitcoin move up or down in the next 24 hours ? - https://github.com/nlauchande/bitpred
  10. 38 ML Problem - MLFlow Immediate Access to a Serving

    Layer ( Batch, Rest API ) Running the prediction service : Calling the rest API
  11. Summary / Lessons learned • It’s painful to move from

    development to production • ML ecosystem is very recent and tools and frameworks are still maturing • Devops and standard software engineering practices are extremely useful • Cross functional working on ML projects at the same time 50