ML Models and Dataset Versioning

ML Models and Dataset Versioning

668c099cd91fd55756e8ec7c8a1cc95e?s=128

Kurian Benoy

October 13, 2019
Tweet

Transcript

  1. ML MODELS AND DATASET VERSIONING Kurian Benoy

  2. $ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert

    in Kernels
  3. $ WHOAMI Open source contributor FOSSASIA OpenTechNights Winner Kaggle Expert

    Final Year BTech student @MEC
  4. OUTLINE Start up Adventures Challenges Model and Dataset versioning How

    I discovered DVC? Use case: Versioning dogs and Cats Conclusion
  5. Startup Adventures

  6. CHALLENGE 1: ML IS SLOW

  7. CHALLENGE 2: WORKING WITH ML PROJECTS Most software products take

    a few seconds to execute. $ git clone project-repo $ pip install -r requirements.txt
  8. None
  9. CHALLENGE 3: METRIC DRIVEN

  10. CHALLENGE 4: NOT ABLE TO USE GIT git not suitable

    for projects > 1GB git clone becomes slow
  11. MODEL VERSIONING

  12. TRACKING EXPERIMENTS TRACKING METRICS

  13. Why Model Versioning? > To keep track of experiments >

    Choose the best ideas >> EXPERIMENTS = CODE + OUTPUTS Models are outputs
  14. DATASET VERSIONING

  15. None
  16. 4 TB/day

  17. None
  18. Why Dataset management? > Moving Datasets around > Datasets evolve,

    so versioning required >> EXPERIMENTS = CODE + DATA + OUTPUTS Source code, Datasets
  19. HOW I DISCOVERED DVC

  20. DATA VERSION CONTROL(DVC)

  21. > Experiment and Dataset tracking > Open-source(3500+ stars) > Build

    to adopt the best practises of ML > Works well with git > Language and framework agnostic
  22. VERSIONING CATS & DOGS

  23. DEMO TIME

  24. DVC WORKFLOW

  25. Tracking data 1 Tracking 1000 cats and dogs 2 Add

    1000 more labelled images of cats & dogs
  26. SWITCHING VERSIONS

  27. CONCLUSION

  28. "Data science as different from software as software was different

    from hardware." Nick Elprin, CEO, DominoLabs.
  29. Think about your processes(ML projects)

  30. Think about your processes Try to version control for your

    projects
  31. Try it out in your ML project!

  32. THANK YOU Twitter: kurianbenoy2 Email : kurian.bkk@gmail.com Speaker Deck: bit.ly/mlversion19

  33. APPENDIX

  34. Other Tools for versioning ML Flow - Tracking Models, Metrics

    Git-LFS - Tracking Large files Jovian - JupyterNB based tracking Neptune.Ml Hangar Py - Versioning Tensor Data