Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
440
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
1
30
Joy of Programming
kurianbenoy
0
23
Expert Interaction on ML
kurianbenoy
0
58
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
56
Final project report - Phase 1
kurianbenoy
0
47
Project Review Slides
kurianbenoy
0
18
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
140
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
61
Malayalam TTS - 1
kurianbenoy
0
76
Other Decks in Programming
See All in Programming
見えないメモリを観測する: PHP 8.4 `pg_result_memory_size()` とSQL結果のメモリ管理
kentaroutakeda
0
930
Lookerは可視化だけじゃない。UIコンポーネントもあるんだ!
ymd65536
1
130
『改訂新版 良いコード/悪いコードで学ぶ設計入門』活用方法−爆速でスキルアップする!効果的な学習アプローチ / effective-learning-of-good-code
minodriven
28
4.1k
return文におけるstd::moveについて
onihusube
1
1.4k
Swiftコンパイラ超入門+async関数の仕組み
shiz
0
170
責務を分離するための例外設計 - PHPカンファレンス 2024
kajitack
9
2.3k
生成AIでGitHubソースコード取得して仕様書を作成
shukob
0
630
Fixstars高速化コンテスト2024準優勝解法
eijirou
0
190
React 19でお手軽にCSS-in-JSを自作する
yukukotani
5
560
DevFest - Serverless 101 with Google Cloud Functions
tunmise
0
140
20241217 競争力強化とビジネス価値創出への挑戦:モノタロウのシステムモダナイズ、開発組織の進化と今後の展望
monotaro
PRO
0
280
月刊 競技プログラミングをお仕事に役立てるには
terryu16
1
1.2k
Featured
See All Featured
Six Lessons from altMBA
skipperchong
27
3.6k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
960
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
240
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
3
180
Keith and Marios Guide to Fast Websites
keithpitt
410
22k
Automating Front-end Workflow
addyosmani
1366
200k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2k
Into the Great Unknown - MozCon
thekraken
34
1.6k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
19
2.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Building a Scalable Design System with Sketch
lauravandoore
460
33k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data