Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
470
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
5
MTech Final Project - Presentation Slides
kurianbenoy
0
12
Project Review Report 5 - MTech Project
kurianbenoy
1
36
Joy of Programming
kurianbenoy
0
42
Expert Interaction on ML
kurianbenoy
0
79
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
81
Final project report - Phase 1
kurianbenoy
0
59
Project Review Slides
kurianbenoy
0
24
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
170
Other Decks in Programming
See All in Programming
副作用と戦う PHP リファクタリング ─ ドメインイベントでビジネスロジックを解きほぐす
kajitack
2
440
MCPを使ってイベントソーシングのAIコーディングを効率化する / Streamlining Event Sourcing AI Coding with MCP
tomohisa
0
180
可変変数との向き合い方 $$変数名が踊り出す$$ / php conference Variable variables
gunji
0
220
AIのメモリー
watany
11
940
商品比較サービス「マイベスト」における パーソナライズレコメンドの第一歩
ucchiii43
0
190
オンコール⼊⾨〜ページャーが鳴る前に、あなたが備えられること〜 / Before The Pager Rings
yktakaha4
2
1.1k
脱Riverpod?fqueryで考える、TanStack Queryライクなアーキテクチャの可能性
ostk0069
0
560
中級グラフィックス入門~効率的なメッシュレット描画~
projectasura
2
1.2k
GPUを計算資源として使おう!
primenumber
1
290
初学者でも今すぐできる、Claude Codeの生産性を10倍上げるTips
s4yuba
16
13k
[DevinMeetupTokyo2025] コード書かせないDevinの使い方
takumiyoshikawa
2
110
The Modern View Layer Rails Deserves: A Vision For 2025 And Beyond @ RailsConf 2025, Philadelphia, PA
marcoroth
2
780
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Agile that works and the tools we love
rasmusluckow
329
21k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Code Review Best Practice
trishagee
69
19k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
21
1.3k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
How STYLIGHT went responsive
nonsquared
100
5.6k
Unsuck your backbone
ammeep
671
58k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data