Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
500
0
Share
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
More Decks by Kurian Benoy
See All by Kurian Benoy
How I ended up maintaining a python package with 1M+ downloads so far?
kurianbenoy
0
9
MTech Final Project - Presentation Slides
kurianbenoy
0
40
Project Review Report 5 - MTech Project
kurianbenoy
1
43
Joy of Programming
kurianbenoy
0
52
Expert Interaction on ML
kurianbenoy
0
94
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
130
Final project report - Phase 1
kurianbenoy
0
83
Project Review Slides
kurianbenoy
0
38
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
190
Other Decks in Programming
See All in Programming
ネイティブアプリとWebフロントエンドのAPI通信ラッパーにおける共通化の勘所
suguruooki
0
240
Claude Codeログ基盤の構築
giginet
PRO
7
3.9k
10年分の技術的負債、完済へ ― Claude Code主導のAI駆動開発でスポーツブルを丸ごとリプレイスした話
takuya_houshima
0
350
KagglerがMixSeekを触ってみた
morim
0
370
事業会社でのセキュリティ長期インターンについて
masachikaura
0
220
Codex CLIのSubagentsによる並列API実装 / Parallel API Implementation with Codex CLI Subagents
takatty
2
820
ロボットのための工場に灯りは要らない
watany
12
3.3k
SkillがSkillを生む:QA観点出しを自動化した
sontixyou
3
2.3k
Rethinking API Platform Filters
vinceamstoutz
0
6.8k
AI-DLC 入門 〜AIコーディングの本質は「コード」ではなく「構造」〜 / Introduction to AI-DLC: The Essence of AI Coding Is Not “Code” but “Structure”
seike460
PRO
0
210
Linux Kernelの1文字のミスで 権限昇格ができた話
rqda
0
2.2k
Feature Toggle は捨てやすく使おう
gennei
0
410
Featured
See All Featured
Navigating Weather and Climate Data
rabernat
0
160
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Ethics towards AI in product and experience design
skipperchong
2
250
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
100
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
It's Worth the Effort
3n
188
29k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.4k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
350
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
Rails Girls Zürich Keynote
gr2m
96
14k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data