Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ML Models and Dataset Versioning
Search
Kurian Benoy
October 13, 2019
Programming
0
370
ML Models and Dataset Versioning
Kurian Benoy
October 13, 2019
Tweet
Share
More Decks by Kurian Benoy
See All by Kurian Benoy
Project Review Report 5 - MTech Project
kurianbenoy
0
7
Joy of Programming
kurianbenoy
0
3
Expert Interaction on ML
kurianbenoy
0
9
Project Review Report 4 - Robust Speech Recognition in Malayalam
kurianbenoy
0
12
Final project report - Phase 1
kurianbenoy
0
14
Project Review Slides
kurianbenoy
0
14
Demysitfying Async&Await in Python and JavaScript
kurianbenoy
0
110
Tensorflow User Groups(TFUG) India Summit
kurianbenoy
0
42
Malayalam TTS - 1
kurianbenoy
0
62
Other Decks in Programming
See All in Programming
ONE WEDGE_company_guide
1wedge_one
0
460
try! Swift Tokyo 初参加報告LT
hinakko2
0
220
#phpcon_odawara オープン・クローズドなテストフィクスチャを求めて / open closed test fixtures
77web
3
230
ScalarDBを用いたマイクロサービスにおけるデータ管理 (Database Engineering Meetup #2)
scalar
0
110
PHPの次期バージョンはこの時期どうなっているのか - Internalsの開発体制について - PHPカンファレンス小田原
youkidearitai
PRO
1
190
Rails と人魚の話/rails-and-mermaid
sanfrecce_osaka
0
100
Random\Randomizer クラスで日常のあれこれを解決しよう! / Random\Randomizer class solves familiar trouble
cocoeyes02
0
210
if constexpr文はテンプレート世界のラムダ式である
faithandbrave
3
640
GitHub Copilotのススメ
marcy731
1
190
Micro Frontends for Java Microservices - Devnexus 2024
mraible
PRO
0
480
"config" ってなんだ? / What is "config"?
okashoi
0
240
Ruby Pattern Matching
bkuhlmann
0
920
Featured
See All Featured
How GitHub Uses GitHub to Build GitHub
holman
468
290k
The Language of Interfaces
destraynor
151
23k
WebSockets: Embracing the real-time Web
robhawkes
59
7k
Thoughts on Productivity
jonyablonski
58
3.8k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
125
32k
jQuery: Nuts, Bolts and Bling
dougneiner
59
7.1k
What's in a price? How to price your products and services
michaelherold
237
11k
StorybookのUI Testing Handbookを読んだ
zakiyama
13
4.6k
The Power of CSS Pseudo Elements
geoffreycrofte
60
5k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
2
1.3k
Making the Leap to Tech Lead
cromwellryan
124
8.5k
Navigating Team Friction
lara
178
13k
Transcript
ML MODELS AND DATASET VERSIONING Kurian Benoy
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
in Kernels
$ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert
Final Year BTech student @MEC
OUTLINE Start up Adventures Challenges Model and Dataset versioning How
I discovered DVC? Use case: Versioning dogs and Cats Conclusion
Startup Adventures
CHALLENGE 1: ML IS SLOW
CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take
a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
None
CHALLENGE 3: METRIC DRIVEN
CHALLENGE 4: NOT ABLE TO USE GIT git not suitable
for projects > 1GB git clone becomes slow
MODEL VERSIONING
TRACKING EXPERIMENTS TRACKING METRICS
Why Model Versioning? > To keep track of experiments >
Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
DATASET VERSIONING
None
4 TB/day
None
Why Dataset management? > Moving Datasets around > Datasets evolve,
so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
HOW I DISCOVERED DVC
DATA VERSION CONTROL(DVC)
> Experiment and Dataset tracking > Open-source(3500+ stars) > Build
to adopt the best practises of ML > Works well with git > Language and framework agnostic
VERSIONING CATS & DOGS
DEMO TIME
DVC WORKFLOW
Tracking data 1 Tracking 1000 cats and dogs 2 Add
1000 more labelled images of cats & dogs
SWITCHING VERSIONS
CONCLUSION
"Data science as different from software as software was different
from hardware." Nick Elprin, CEO, DominoLabs.
Think about your processes(ML projects)
Think about your processes Try to version control for your
projects
Try it out in your ML project!
THANK YOU Twitter: kurianbenoy2 Email :
[email protected]
Speaker Deck: bit.ly/mlversion19
APPENDIX
Other Tools for versioning ML Flow - Tracking Models, Metrics
Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data