Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
510
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
12
MTech Final Project - Presentation Slides
kurianbenoy
0
54
Project Review Report 5 - MTech Project
kurianbenoy
1
48
Joy of Programming
kurianbenoy
0
58
Expert Interaction on ML
kurianbenoy
0
100
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
140
Final project report - Phase 1
kurianbenoy
0
87
Project Review Slides
kurianbenoy
0
54
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
200
Other Decks in Programming
See All in Programming
メソッドのジェネリクスでGoの夢は広がるか? / Kyoto.go #65
utgwkk
3
680
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
160
AIとASP.NET Coreで雑Webアプリを作った話
mayuki
0
500
Why Laravel apps break—Mastering the fundamentals to keep them maintainable
kentaroutakeda
1
350
Copilot CLI の継戦能力を高める コンテキスト管理
nozomutu
1
1.2k
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
3.6k
TypeScript+Orvalで実現する型安全かつ堅牢でスケーラブルなマルチチャネル通知基盤 / TSKaigi Night talks ~after conference~
d0riven
0
320
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
150
Claspは野良GASの夢をみるか
takter00
0
180
ローカルLLMでどこまでコードが書けるか -拡張版 / How much code can be written on a local LLM Extended
kishida
2
1k
技術記事、AIに書かせるか、自分で書くか? 〜それでも私が自分の手で書く理由〜 / #QiitaConference
jnchito
2
1.4k
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
240
Featured
See All Featured
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.3k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
200
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
Bash Introduction
62gerente
615
220k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Claude Code のすすめ
schroneko
67
230k
Amusing Abliteration
ianozsvald
1
200
Agile that works and the tools we love
rasmusluckow
331
21k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
970
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data