Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
500
0
Share
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
10
MTech Final Project - Presentation Slides
kurianbenoy
0
43
Project Review Report 5 - MTech Project
kurianbenoy
1
44
Joy of Programming
kurianbenoy
0
54
Expert Interaction on ML
kurianbenoy
0
96
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
130
Final project report - Phase 1
kurianbenoy
0
84
Project Review Slides
kurianbenoy
0
42
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
190
Other Decks in Programming
See All in Programming
Server-Side Kotlin LT大会 vol.18 [Kotlin-lspの最新情報と Neovimのlsp設定例]
yasunori0418
1
190
Vibe NLP for Applied NLP
inesmontani
PRO
0
490
VueエンジニアがReactを触って感じた_設計の違い
koukimiura
0
190
GNU Makeの使い方 / How to use GNU Make
kaityo256
PRO
16
5.6k
mruby on C#: From VM Implementation to Game Scripting (RubyKaigi 2026)
hadashia
2
700
JAWS-UG横浜 #100 祝・第100回スペシャルAWS は VPC レスの時代へ
maroon1st
0
180
Terraform言語の静的解析 / static analysis of Terraform language
wata727
1
110
Cache-moi si tu peux : patterns et pièges du cache en production - Devoxx France 2026 - Conférence
slecache
0
310
Making the RBS Parser Faster
soutaro
0
540
セグメントとターゲットを意識するプロポーザルの書き方 〜採択の鍵は、誰に刺すかを見極めるマーケティング戦略にある〜
m3m0r7
PRO
0
590
Running Swift without an OS
kishikawakatsumi
0
850
Liberating Ruby's Parser from Lexer Hacks
ydah
2
2.2k
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
820
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
410
The Cult of Friendly URLs
andyhume
79
6.9k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
120
Ruling the World: When Life Gets Gamed
codingconduct
0
210
ラッコキーワード サービス紹介資料
rakko
1
3.1M
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2k
Building the Perfect Custom Keyboard
takai
2
740
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
We Have a Design System, Now What?
morganepeng
55
8.1k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data