Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
430
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
1
26
Joy of Programming
kurianbenoy
0
13
Expert Interaction on ML
kurianbenoy
0
42
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
48
Final project report - Phase 1
kurianbenoy
0
43
Project Review Slides
kurianbenoy
0
17
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
120
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
49
Malayalam TTS - 1
kurianbenoy
0
69
Other Decks in Programming
See All in Programming
Outline View in SwiftUI
1024jp
1
320
レガシーシステムにどう立ち向かうか 複雑さと理想と現実/vs-legacy
suzukihoge
14
2.2k
What’s New in Compose Multiplatform - A Live Tour (droidcon London 2024)
zsmb
1
470
距離関数を極める! / SESSIONS 2024
gam0022
0
280
CSC509 Lecture 09
javiergs
PRO
0
140
WebフロントエンドにおけるGraphQL(あるいはバックエンドのAPI)との向き合い方 / #241106_plk_frontend
izumin5210
4
1.4k
AWS Lambdaから始まった Serverlessの「熱」とキャリアパス / It started with AWS Lambda Serverless “fever” and career path
seike460
PRO
1
250
Hotwire or React? ~アフタートーク・本編に含めなかった話~ / Hotwire or React? after talk
harunatsujita
1
120
AWS IaCの注目アップデート 2024年10月版
konokenj
3
3.3k
弊社の「意識チョット低いアーキテクチャ」10選
texmeijin
5
24k
CSC509 Lecture 11
javiergs
PRO
0
180
Tauriでネイティブアプリを作りたい
tsucchinoko
0
370
Featured
See All Featured
GraphQLの誤解/rethinking-graphql
sonatard
67
10k
A Philosophy of Restraint
colly
203
16k
Code Review Best Practice
trishagee
64
17k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
The Cost Of JavaScript in 2023
addyosmani
45
6.7k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
Six Lessons from altMBA
skipperchong
27
3.5k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
5 minutes of I Can Smell Your CMS
philhawksworth
202
19k
GraphQLとの向き合い方2022年版
quramy
43
13k
Optimizing for Happiness
mojombo
376
70k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data