From Chaos to Order: How MLFlow Transforms Experiment Tracking and Reproducibility

From Chaos to Order: How MLFlow Transforms Experiment Tracking and
Reproducibility

Agenda 1. Understanding why ML is different 2. Pain points
in the traditional approach 3. Enter MLFlow 4. Using the OSS version 5. Problems with it 6. Ways to solve them

Why is ML different from traditional software • Once you
have the code, you can recreate/ build the same software again (as long as the code remains the same)

General ML Workflow

So what’s different • More variables that can change •
A non reproducible process • Keeping track of data vs Keeping track of code

What really drives ML • Metrics!!! • How accurate are
your predictions • Tracking these metrics and improving them incrementally • Testing the model on data not used for training

What is messy about tracking metrics • Need to manage
associated files. • For example, tensorboard stores files required for graphs and other details as part of a ‘log’ folder. • Things can get out of hand quickly once you have more iterations

Pain points • Tracking experiments • Reproducing results • Tracking
models • Deployment

The lesson we learnt from the whales • In January
2016, deepsense.ai won the Right Whale Recognition contest on Kaggle. • To collect the prize, the winning team had to provide source code and a description of how to recreate the winning solution. • The team took 3 weeks to recreate the winning solution. • Funnily enough the trauma made Deepsense.ai build Neptune.ai (similar tool to MLFlow)

Introducing MLFlow • Open source project developed by Databricks to
address pain points in ML Lifecycle • ML projects can get complex and hard to reproduce or track • Framework independent

MLFlow’s capabilities

MLFLow code

Autolog to the rescue!

The UI

Making it more friendly • Experiment runs listed based on
the run date • It’s nice but not simple on the eyes • Tag runs with a version number for reference • Multiple teams can run different experiments on the same server

Our initial setup • Well, we can track experiments with
ease now. • How do I show the experiments to my coworker • Can you see my screen real quick

Let’s deploy it on an instance • Run MLFlow on
a server. • Anyone can access the URL :| • Why is the UI slow • Maybe we have too many experiments • Graphs have a limit on how many values you can log

A few moments later • Store experiments in a DB
• Store artifacts on cloud storage • Both options are supported by MLFlow • Still don’t have auth

Let’s just use Nginx? • Atleast the URL isn’t open
anymore. • Basic username and password authentication. • Low effort solution

MLFlow Auth • Relatively newer • Still in experimental phase
• Permissions and Roles for managing users (which we were missing)

How MLFlow changed our work • One place to find
everything related to the experiment • Metrics, parameters, model artifacts • Easy access to details (RIP google sheets) • Number of iterations didn’t matter anymore • A proper goodbye to debugging nightmares

Contact Thanks! Twitter vikas_sshetty Email [email protected]

From Chaos to Order: How MLFlow Transforms Expe...

From Chaos to Order: How MLFlow Transforms Experiment Tracking and Reproducibility

Vikas Shetty

Featured

Transcript