Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
480
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
6
MTech Final Project - Presentation Slides
kurianbenoy
0
13
Project Review Report 5 - MTech Project
kurianbenoy
1
37
Joy of Programming
kurianbenoy
0
44
Expert Interaction on ML
kurianbenoy
0
84
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
89
Final project report - Phase 1
kurianbenoy
0
65
Project Review Slides
kurianbenoy
0
24
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
170
Other Decks in Programming
See All in Programming
Oracle Database Technology Night 92 Database Connection control FAN-AC
oracle4engineer
PRO
1
440
デザイナーが Androidエンジニアに 挑戦してみた
874wokiite
0
280
ファインディ株式会社におけるMCP活用とサービス開発
starfish719
0
290
Go言語での実装を通して学ぶLLMファインチューニングの仕組み / fukuokago22-llm-peft
monochromegane
0
120
TDD 実践ミニトーク
contour_gara
1
290
Android 16 × Jetpack Composeで縦書きテキストエディタを作ろう / Vertical Text Editor with Compose on Android 16
cc4966
0
180
AIコーディングAgentとの向き合い方
eycjur
0
270
2025 年のコーディングエージェントの現在地とエンジニアの仕事の変化について
azukiazusa1
22
12k
為你自己學 Python - 冷知識篇
eddie
1
350
Compose Multiplatform × AI で作る、次世代アプリ開発支援ツールの設計と実装
thagikura
0
140
はじめてのMaterial3 Expressive
ym223
2
250
Putting The Genie in the Bottle - A Crash Course on running LLMs on Android
iurysza
0
140
Featured
See All Featured
KATA
mclloyd
32
14k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Typedesign – Prime Four
hannesfritz
42
2.8k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.6k
4 Signs Your Business is Dying
shpigford
184
22k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.9k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data