Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
470
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
4
MTech Final Project - Presentation Slides
kurianbenoy
0
11
Project Review Report 5 - MTech Project
kurianbenoy
1
36
Joy of Programming
kurianbenoy
0
39
Expert Interaction on ML
kurianbenoy
0
75
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
71
Final project report - Phase 1
kurianbenoy
0
56
Project Review Slides
kurianbenoy
0
24
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
160
Other Decks in Programming
See All in Programming
TypeScript LSP の今までとこれから
quramy
1
510
Go Modules: From Basics to Beyond / Go Modulesの基本とその先へ
kuro_kurorrr
0
120
Cursor Meetup Tokyo ゲノミクスとCursor: 進化と制約のあいだ
koido
2
1k
Benchmark
sysong
0
220
Cloudflare Realtime と Workers でつくるサーバーレス WebRTC
nekoya3
0
420
deno-redisの紹介とJSRパッケージの運用について (toranoana.deno #21)
uki00a
0
120
Development of an App for Intuitive AI Learning - Blockly Summit 2025
teba_eleven
0
120
AIコーディング道場勉強会#2 君(エンジニア)たちはどう生きるか
misakiotb
1
230
都市をデータで見るってこういうこと PLATEAU属性情報入門
nokonoko1203
1
530
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
41
27k
Julia という言語について (FP in Julia « SIDE: F ») for 関数型まつり2025
antimon2
3
960
コード書くの好きな人向けAIコーディング活用tips #orestudy
77web
3
320
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
231
18k
GraphQLとの向き合い方2022年版
quramy
46
14k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
A Tale of Four Properties
chriscoyier
159
23k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.6k
Site-Speed That Sticks
csswizardry
10
650
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.5k
Building a Modern Day E-commerce SEO Strategy
aleyda
41
7.3k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
For a Future-Friendly Web
brad_frost
179
9.8k
Designing for Performance
lara
609
69k
Become a Pro
speakerdeck
PRO
28
5.4k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data