DataTalk#40 3/3 -- MLFlow & DVC pour la gestion des expériences en sciences des données

Experiment Management in Machine Learning By Hacene Karrad Computer Vision
Engineer, Delair

3 • What do we do at Delair? • The
ML as an iterative process • Iterative processes management • What do we need and an experiment manager • Introduction to Mlflow • Dataset tracking with DVC We will cover:

4 DELAIR UX11 DT26 Delair.ai

6 Data Analysis at Delair Semantic Segmentation (2D, pointcloud...,) Object
Detection Instance Segmentation

7 • Machine learning experiments are iterative scientific projects. •
Machine Learning It’s about finding the right model for the data. • It’s often hard to track and to reproduce. Different hyperparameters, different datasets, different codes and different architectures ⇒ Different model’s performance (metrics…) The Machine Learning Iterative Process Kenneth Jensen [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)]

8 The Machine Learning Iterative Process In the phase of
model exploration and model refinement, generally we iterated many times to find a suitable model. Lots of experiments are done, and most are lost and wasted, whereas they can provide great insight. Today’s scope

9 Managing Iterative Processes • The need to keep track
of machine learning experiments. • Resume suspended research. • Revisit models, choices, notes… • Reproduce experimental results. • Draw and revisit conclusions. XKCD: Creative Commons Attribution-NonCommercial 2.5 License.

We were looking for at tool that: • Is framework
agnostic (pytorch, tensorflow, keras …). • Centralizes machine learning experiments tracking (collaboration). • Is not code intrusive. • Seamlessly integrate with our workflow. • Is not complicated to backup. • Does not require an expert to maintain. 10 We were looking for... Alternative tools of experiment tracking

11 We found mlflow What it is included: ⇡ Log
params, log metrics, plots them in a frontend. ⇡ Centralized artifacts and experiment results / or distributed. ⇡ Saving notes. ⇡ Code tracking (git hash). What is not included: ⇣ Hyper parameter optimization. ⇣ Scheduling. ⇣ No data tracking. ⇣ No replay training from the interface. ⇣ ⇒ That’s exactly what we were expecting.

12 What about datasets tracking? • Datasets are the most
important “hyperparameter”. • Dataset versioning is important, yet somehow tricky. ⇒ We use DVC to track datasets. Source: https://in.pycon.org/cfp/2019/proposals/model-and-dataset-versioning-practices-using-dvc-tool~ej1zd/

Thank You! Q ? A

14 Sources • Meetup mlflow example: https://github.com/dltkhacene/mnist_mlflow • DVC Github
example: https://github.com/dltkhacene/meetup-demo-dvc • Git tutorial: https://www.youtube.com/watch?v=41tsyReTloA • Delair website: delair.aero • Mlflow official website: http://mlflow.org/docs/ • DVC official website: https://dvc.org/

15 Appendix: GIT Source: https://www.youtube.com/watch?v=41tsyReTloA

DataTalk#40 3/3 -- MLFlow & DVC pour la gestion...

DataTalk#40 3/3 -- MLFlow & DVC pour la gestion des expériences en sciences des données

Toulouse Data Science

More Decks by Toulouse Data Science

Other Decks in Programming

Featured

Transcript

Experiment Management in Machine Learning By Hacene Karrad Computer Vision

3 • What do we do at Delair? • The

4 DELAIR UX11 DT26 Delair.ai

5

6 Data Analysis at Delair Semantic Segmentation (2D, pointcloud...,) Object

7 • Machine learning experiments are iterative scientific projects. •

8 The Machine Learning Iterative Process In the phase of

9 Managing Iterative Processes • The need to keep track

We were looking for at tool that: • Is framework

11 We found mlflow What it is included: ⇡ Log

12 What about datasets tracking? • Datasets are the most

Thank You! Q ? A

14 Sources • Meetup mlflow example: https://github.com/dltkhacene/mnist_mlflow • DVC Github

15 Appendix: GIT Source: https://www.youtube.com/watch?v=41tsyReTloA