$30 off During Our Annual Pro Sale. View Details »

DataTalk#40 3/3 -- MLFlow & DVC pour la gestion des expériences en sciences des données

DataTalk#40 3/3 -- MLFlow & DVC pour la gestion des expériences en sciences des données

Speaker: Hacene Karrad, Delair

La démocratisation de l’apprentissage automatique et profond a rendu indispensable les outils de gestion des expériences. Ces outils permettent de gérer, organiser, suivre et enregistrer les expériences d'apprentissage automatique. Chez Delair, on entraîne et on déploie fréquemment des solutions à base d’apprentissage automatique et profond appliquées sur des images de drones. On utilise les différents outils de l'état de l’art pour entraîner ces modèles, et mlflow est un outil essentiel qu’on utilise pour gérer nos expériences. Dans cette présentation on vous parlera de notre motivation derrière l’utilisation de cet outil, et on vous présentera un cas d’usage pour illustrer ses fonctionnalités

Toulouse Data Science

November 19, 2019
Tweet

More Decks by Toulouse Data Science

Other Decks in Programming

Transcript

  1. View Slide

  2. Experiment Management in
    Machine Learning
    By Hacene Karrad
    Computer Vision Engineer, Delair

    View Slide

  3. 3
    ● What do we do at Delair?
    ● The ML as an iterative process
    ● Iterative processes management
    ● What do we need and an experiment manager
    ● Introduction to Mlflow
    ● Dataset tracking with DVC
    We will cover:

    View Slide

  4. 4
    DELAIR
    UX11
    DT26
    Delair.ai

    View Slide

  5. 5

    View Slide

  6. 6
    Data Analysis at Delair
    Semantic Segmentation (2D, pointcloud...,)
    Object Detection
    Instance Segmentation

    View Slide

  7. 7
    ● Machine learning experiments are iterative scientific projects.
    ● Machine Learning It’s about finding the right model for the
    data.
    ● It’s often hard to track and to reproduce.
    Different hyperparameters, different datasets, different codes and
    different architectures ⇒ Different model’s performance
    (metrics…)
    The Machine Learning Iterative Process
    Kenneth Jensen [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)]

    View Slide

  8. 8
    The Machine Learning Iterative Process
    In the phase of model exploration and model
    refinement, generally we iterated many times to
    find a suitable model.
    Lots of experiments are done, and most are lost
    and wasted, whereas they can provide great
    insight.
    Today’s scope

    View Slide

  9. 9
    Managing Iterative Processes
    ● The need to keep track of machine learning
    experiments.
    ● Resume suspended research.
    ● Revisit models, choices, notes…
    ● Reproduce experimental results.
    ● Draw and revisit conclusions.
    XKCD: Creative Commons Attribution-NonCommercial 2.5 License.

    View Slide

  10. We were looking for at tool that:
    ● Is framework agnostic (pytorch, tensorflow, keras …).
    ● Centralizes machine learning experiments tracking
    (collaboration).
    ● Is not code intrusive.
    ● Seamlessly integrate with our workflow.
    ● Is not complicated to backup.
    ● Does not require an expert to maintain.
    10
    We were looking for...
    Alternative tools of experiment tracking

    View Slide

  11. 11
    We found mlflow
    What it is included:
    ⇡ Log params, log metrics, plots them in a
    frontend.
    ⇡ Centralized artifacts and experiment
    results / or distributed.
    ⇡ Saving notes.
    ⇡ Code tracking (git hash).
    What is not included:
    ⇣ Hyper parameter optimization.
    ⇣ Scheduling.
    ⇣ No data tracking.
    ⇣ No replay training from the interface.
    ⇣ ⇒ That’s exactly what we were expecting.

    View Slide

  12. 12
    What about datasets tracking?
    ● Datasets are the most important
    “hyperparameter”.
    ● Dataset versioning is important, yet
    somehow tricky.
    ⇒ We use DVC to track datasets.
    Source: https://in.pycon.org/cfp/2019/proposals/model-and-dataset-versioning-practices-using-dvc-tool~ej1zd/

    View Slide

  13. Thank You!
    Q ? A

    View Slide

  14. 14
    Sources
    ● Meetup mlflow example: https://github.com/dltkhacene/mnist_mlflow
    ● DVC Github example: https://github.com/dltkhacene/meetup-demo-dvc
    ● Git tutorial: https://www.youtube.com/watch?v=41tsyReTloA
    ● Delair website: delair.aero
    ● Mlflow official website: http://mlflow.org/docs/
    ● DVC official website: https://dvc.org/

    View Slide

  15. 15
    Appendix: GIT
    Source: https://www.youtube.com/watch?v=41tsyReTloA

    View Slide

  16. View Slide