Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
460
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
1
32
Joy of Programming
kurianbenoy
0
32
Expert Interaction on ML
kurianbenoy
0
67
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
63
Final project report - Phase 1
kurianbenoy
0
52
Project Review Slides
kurianbenoy
0
21
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
150
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
74
Malayalam TTS - 1
kurianbenoy
0
84
Other Decks in Programming
See All in Programming
custom_lintで始めるチームルール管理
akaboshinit
0
200
PHP で学ぶ OAuth 入門
azuki
1
130
サービスレベルを管理してアジャイルを加速しよう!! / slm-accelerate-agility
tomoyakitaura
1
170
趣味全開のAITuber開発
kokushin
0
190
gen_statem - OTP's Unsung Hero
whatyouhide
1
190
5年間継続して開発した自作OSSの記録
bebeji_nappa
0
170
SQL Server ベクトル検索
odashinsuke
0
170
Rollupのビルド時間高速化によるプレビュー表示速度改善とバンドラとASTを駆使したプロダクト開発の難しさ
plaidtech
PRO
1
160
Kubernetesで実現できるPlatform Engineering の現在地
nwiizo
3
1.9k
AIコーディングワークフローの試行 〜AIエージェント×ワークフローでの自動化を目指して〜
rkaga
2
3.4k
AI Coding Agent Enablement - エージェントを自走させよう
yukukotani
13
5.8k
英語 × の私が、生成AIの力を借りて、OSSに初コントリビュートした話
personabb
0
180
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
37
1.7k
Building Applications with DynamoDB
mza
94
6.3k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
2.9k
A designer walks into a library…
pauljervisheath
205
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
9
740
Agile that works and the tools we love
rasmusluckow
328
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Java REST API Framework Comparison - PWX 2021
mraible
30
8.5k
Building a Modern Day E-commerce SEO Strategy
aleyda
40
7.2k
Testing 201, or: Great Expectations
jmmastey
42
7.4k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data