Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introductory Guide to simplifying MLOps process with DVC, CML AND MLEM

An Introductory Guide to simplifying MLOps process with DVC, CML AND MLEM

A Talk at DataFestAfrica: The Biggest Data Conference in Africa.

Session link - https://datafestafrica.sessionize.com/speaker/8adc8746-ab40-4fe3-a041-f82b91a0f022

About DatafestAfrica - https://datafestafrica.com/

I talked about:

- Experiment tracking and data versioning with DVC.
- CI/CD for machine learning with CML.
- Model deployment with MLEM.
- MLOps community Lagos.

GiftOjeabulu

October 18, 2022
Tweet

More Decks by GiftOjeabulu

Other Decks in Programming

Transcript

  1. Building and Enlightening Data Professionals in Africa. An annual conference

    for all data practitioners in Africa. #DataFestAfrica22 #DFA22
  2. Who am I? Gift Ojeabulu Twitter: @GiftOjeabulu_ - Co-founder and

    community lead at Data Community Africa/DatafestAfrica. - Organizer of MLOps Community Lagos meetup. - AWS ML Community Builder, Global AI Hub ML thought leader, technical writer and public speaker. - Podcast Host at Datapodchat. - Founder & facilitator of the African Data Community Newsletter with over 1.61k subscribers. - Technical documentation and content lead for slik- wrangler.
  3. What is MLOps? MLOps or ML Ops is a set

    of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous development practice of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems
  4. Why MLOps? MLOps is a set of practices for collaboration

    and communication between data scientists and operations professionals. Applying these practices increases the quality, simplifies the management process, and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments.
  5. DVC

  6. About DVC Data Version Control is a data versioning, ML

    workflow automation, and experiment management tool that takes advantage of the existing software engineering toolset you're already familiar with (Git, your IDE, CI/CD, etc.). DVC helps data science and machine learning teams manage large datasets, make projects reproducible, and better collaborate.
  7. Why DVC Even with all the success we've seen today

    in machine learning, especially with deep learning and its applications in business, data scientists still lack best practices for organizing their projects and collaborating effectively. This is a critical challenge: while ML algorithms and methods are no longer tribal knowledge, they are still difficult to implement, reuse, and manage.
  8. Use case If you store and process data files or

    datasets to produce other data or machine learning models, and you want to • track and save data and machine learning models the same way you capture code; • create and switch between versions of data and ML models easily; • understand how datasets and ML artifacts were built in the first place; • compare model metrics among experiments; • adopt engineering tools and best practices in data science projects; DVC is for you!
  9. CML

  10. About CML Continuous Machine Learning (CML) is an open-source library

    for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
  11. About MLEM MLEM is a tool to easily package, deploy

    and serve Machine Learning models. It seamlessly supports a variety of scenarios like real-time serving and batch processing.
  12. Use case for MLEM If you train Machine Learning models

    and you want to • save machine learning models along with all meta-information that is required to run them; • build your models into ready-to-use format like Python packages or Docker Images; • deploy your models, easily switching between different providers when you need to; • adopt engineering tools and best practices in data science projects; MLEM is for you!
  13. Play DeeVee’s Ramen Run! 1st Prize 75,000 Chimoney 2nd Prize

    50,000 Chimoney 3rd Prize 25,000 Chimoney SCAN ME!