Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
440
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
1
27
Joy of Programming
kurianbenoy
0
17
Expert Interaction on ML
kurianbenoy
0
49
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
52
Final project report - Phase 1
kurianbenoy
0
46
Project Review Slides
kurianbenoy
0
18
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
130
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
55
Malayalam TTS - 1
kurianbenoy
0
72
Other Decks in Programming
See All in Programming
暇に任せてProxmoxコンソール 作ってみました
karugamo
1
720
クリエイティブコーディングとRuby学習 / Creative Coding and Learning Ruby
chobishiba
0
3.9k
rails stats で紐解く ANDPAD のイマを支える技術たち
andpad
1
290
Symfony Mapper Component
soyuka
2
730
採用事例の少ないSvelteを選んだ理由と それを正解にするためにやっていること
oekazuma
2
1k
急成長期の品質とスピードを両立するフロントエンド技術基盤
soarteclab
0
920
Stackless и stackful? Корутины и асинхронность в Go
lamodatech
0
650
KubeCon + CloudNativeCon NA 2024 Overviewat Kubernetes Meetup Tokyo #68 / amsy810_k8sjp68
masayaaoyama
0
250
LLM Supervised Fine-tuningの理論と実践
datanalyticslabo
3
970
Webエンジニア主体のモバイルチームの 生産性を高く保つためにやったこと
igreenwood
0
330
testcontainers のススメ
sgash708
1
120
Итераторы в Go 1.23: зачем они нужны, как использовать, и насколько они быстрые?
lamodatech
0
660
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
The Cost Of JavaScript in 2023
addyosmani
45
7k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5k
Why Our Code Smells
bkeepers
PRO
335
57k
Making Projects Easy
brettharned
116
5.9k
Testing 201, or: Great Expectations
jmmastey
40
7.1k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
26
1.9k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.2k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data