Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
490
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
9
MTech Final Project - Presentation Slides
kurianbenoy
0
22
Project Review Report 5 - MTech Project
kurianbenoy
1
41
Joy of Programming
kurianbenoy
0
46
Expert Interaction on ML
kurianbenoy
0
88
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
110
Final project report - Phase 1
kurianbenoy
0
71
Project Review Slides
kurianbenoy
0
29
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
180
Other Decks in Programming
See All in Programming
大規模Cloud Native環境におけるFalcoの運用
owlinux1000
0
240
チームをチームにするEM
hitode909
0
430
実はマルチモーダルだった。ブラウザの組み込みAI🧠でWebの未来を感じてみよう #jsfes #gemini
n0bisuke2
3
1.4k
クラウドに依存しないS3を使った開発術
simesaba80
0
210
re:Invent 2025 トレンドからみる製品開発への AI Agent 活用
yoskoh
0
580
tsgolintはいかにしてtypescript-goの非公開APIを呼び出しているのか
syumai
7
2.4k
Go コードベースの構成と AI コンテキスト定義
andpad
0
150
20251212 AI 時代的 Legacy Code 營救術 2025 WebConf
mouson
0
240
GISエンジニアから見たLINKSデータ
nokonoko1203
0
190
React 19でつくる「気持ちいいUI」- 楽観的UIのすすめ
himorishige
11
3.7k
AI時代を生き抜く 新卒エンジニアの生きる道
coconala_engineer
1
510
Flutter On-device AI로 완성하는 오프라인 앱, 박제창 @DevFest INCHEON 2025
itsmedreamwalker
1
180
Featured
See All Featured
A Tale of Four Properties
chriscoyier
162
23k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
140
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
84
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
A designer walks into a library…
pauljervisheath
210
24k
Making Projects Easy
brettharned
120
6.5k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Producing Creativity
orderedlist
PRO
348
40k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
35
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
230
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
130
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data