Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Kurian Benoy
October 13, 2019
Programming
0
490
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
9
MTech Final Project - Presentation Slides
kurianbenoy
0
30
Project Review Report 5 - MTech Project
kurianbenoy
1
42
Joy of Programming
kurianbenoy
0
47
Expert Interaction on ML
kurianbenoy
0
90
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
110
Final project report - Phase 1
kurianbenoy
0
79
Project Review Slides
kurianbenoy
0
34
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
180
Other Decks in Programming
See All in Programming
AIフル活用時代だからこそ学んでおきたい働き方の心得
shinoyu
0
140
SourceGeneratorのススメ
htkym
0
200
開発者から情シスまで - 多様なユーザー層に届けるAPI提供戦略 / Postman API Night Okinawa 2026 Winter
tasshi
0
200
「ブロックテーマでは再現できない」は本当か?
inc2734
0
1k
CSC307 Lecture 10
javiergs
PRO
1
660
AI時代の認知負荷との向き合い方
optfit
0
160
CSC307 Lecture 08
javiergs
PRO
0
670
16年目のピクシブ百科事典を支える最新の技術基盤 / The Modern Tech Stack Powering Pixiv Encyclopedia in its 16th Year
ahuglajbclajep
5
1k
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
6.1k
[KNOTS 2026登壇資料]AIで拡張‧交差する プロダクト開発のプロセス および携わるメンバーの役割
hisatake
0
290
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
370
Vibe Coding - AI 驅動的軟體開發
mickyp100
0
180
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
110
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
55
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.3k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
730
Test your architecture with Archunit
thirion
1
2.2k
Being A Developer After 40
akosma
91
590k
How STYLIGHT went responsive
nonsquared
100
6k
Context Engineering - Making Every Token Count
addyosmani
9
660
How to Talk to Developers About Accessibility
jct
2
130
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
830
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data