Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
490
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
9
MTech Final Project - Presentation Slides
kurianbenoy
0
16
Project Review Report 5 - MTech Project
kurianbenoy
1
40
Joy of Programming
kurianbenoy
0
46
Expert Interaction on ML
kurianbenoy
0
88
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
100
Final project report - Phase 1
kurianbenoy
0
67
Project Review Slides
kurianbenoy
0
27
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
180
Other Decks in Programming
See All in Programming
React Native New Architecture 移行実践報告
taminif
1
150
複数人でのCLI/Infrastructure as Codeの暮らしを良くする
shmokmt
5
2.2k
配送計画の均等化機能を提供する取り組みについて(⽩⾦鉱業 Meetup Vol.21@六本⽊(数理最適化編))
izu_nori
0
140
生成AIを利用するだけでなく、投資できる組織へ
pospome
0
220
AIコーディングエージェント(Gemini)
kondai24
0
200
テストやOSS開発に役立つSetup PHP Action
matsuo_atsushi
0
150
251126 TestState APIってなんだっけ?Step Functionsテストどう変わる?
east_takumi
0
310
AIコーディングエージェント(skywork)
kondai24
0
150
ZOZOにおけるAI活用の現在 ~モバイルアプリ開発でのAI活用状況と事例~
zozotech
PRO
8
5.4k
Context is King? 〜Verifiability時代とコンテキスト設計 / Beyond "Context is King"
rkaga
6
990
まだ間に合う!Claude Code元年をふりかえる
nogu66
2
240
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
140
Featured
See All Featured
Site-Speed That Sticks
csswizardry
13
990
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
A better future with KSS
kneath
240
18k
Balancing Empowerment & Direction
lara
5
790
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.2k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data