$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
430
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
1
27
Joy of Programming
kurianbenoy
0
14
Expert Interaction on ML
kurianbenoy
0
43
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
49
Final project report - Phase 1
kurianbenoy
0
43
Project Review Slides
kurianbenoy
0
17
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
120
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
50
Malayalam TTS - 1
kurianbenoy
0
69
Other Decks in Programming
See All in Programming
我々のデザインシステムは Chakra v3 にアップデートします
shunya078
2
1.5k
[FlutterKaigi2024] Effective Form 〜Flutterによる複雑なフォーム開発の実践〜
chocoyama
0
3.9k
flutterkaigi_2024.pdf
kyoheig3
0
340
C++でシェーダを書く
fadis
6
4.2k
as(型アサーション)を書く前にできること
marokanatani
10
2.9k
Cursorでアプリケーションの追加開発や保守をどこまでできるか試したら得るものが多かった話
drumnistnakano
0
180
ローコードSaaSのUXを向上させるためのTypeScript
taro28
1
720
.NET のための通信フレームワーク MagicOnion 入門 / Introduction to MagicOnion
mayuki
1
2.8k
watsonx.ai Dojo #4 生成AIを使ったアプリ開発、応用編
oniak3ibm
PRO
1
250
「天気予報があなたに届けられるまで」 - NIFTY Tech Talk #22
niftycorp
PRO
0
130
大規模サイトリビルドの現場から:成功と失敗のリアルな教訓 / Site Rebuild,Real Lessons Learned from Successes and Failures_JJUG Fall 2024
techtekt
0
190
今からはじめるAndroidアプリ開発 2024 / DevFest 2024
star_zero
0
250
Featured
See All Featured
Automating Front-end Workflow
addyosmani
1366
200k
Docker and Python
trallard
40
3.1k
Happy Clients
brianwarren
98
6.7k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
6.9k
Mobile First: as difficult as doing things right
swwweet
222
8.9k
Ruby is Unlike a Banana
tanoku
97
11k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
Fireside Chat
paigeccino
34
3k
The Pragmatic Product Professional
lauravandoore
31
6.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
247
1.3M
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data