$30 off During Our Annual Pro Sale. View Details »

DVC を活用した機械学習パイプライン開発の高速化 / Using DVC to accelerate machine learning pipeline development

Takayuki Kasai
June 16, 2021
2.2k

DVC を活用した機械学習パイプライン開発の高速化 / Using DVC to accelerate machine learning pipeline development

第8回 MLOps 勉強会 Tokyo (Online)
https://mlops.connpass.com/event/211953/

Takayuki Kasai

June 16, 2021
Tweet

Transcript

 1. ©2021 Wantedly, Inc.
  %7$Λ׆༻ͨ͠
  ػցֶशύΠϓϥΠϯ։ൃͷߴ଎Խ
  ୈ8ճ MLOps ษڧձ Tokyo (Online)
  Takayuki Kasai (@unblee) Wantedly, Inc.
  2021.6.16

  View Slide

 2. ©2021 Wantedly, Inc.
  ࿩͢͜ͱ
  8BOUFEMZ7JTJUͷ.BUDIJOHνʔϜͷ
  ػցֶशύΠϓϥΠϯ։ൃʹ͓͍ͯൃੜ͍ͯͨ͠໰୊ͱ
  ͦΕΛղܾ͢ΔͨΊʹ࠾ͬͨΞϓϩʔνʹ͍ͭͯ঺հ͠·͢

  View Slide

 3. ©2021 Wantedly, Inc.
  ։ൃ؀ڥͷ໰୊఺
  طଘπʔϧͷػೳͱ໰୊఺
  ΍ͬͨ͜ͱ
  ݁ՌͲ͏ͳ͔ͬͨ
  ݱঢ়ͷ՝୊఺
  ࿩͢͜ͱ

  View Slide

 4. ©2021 Wantedly, Inc.
  ։ൃ؀ڥͷ໰୊఺

  View Slide

 5. ©2021 Wantedly, Inc.
  ػցֶशύΠϓϥΠϯ ։ൃ؀ڥͷ໰୊఺
  w ύΠϓϥΠϯͷྲྀΕࣗମ͸γϯϓϧ
  w ։ൃ͸ओʹLT্ͷ1PEͰߦ͍ͬͯΔ
  • ຊ൪ͱ։ൃ؀ڥͷ࠶ݱੑͷ୲อͱϦιʔεʢओʹϝϞϦʣΛεέʔϧͤ͞ΔͨΊ


  w ୯ମͰ͸ϫʔΫϑϩʔΤϯδϯ͸࢖͍ͬͯͳ͍
  • ϚΠΫϩαʔϏεؒΛ·͙ͨґଘؔ܎Λ͍࣋ͬͯΔ৔߹͸ Argo Workflows Λར༻͍ͯ͠Δ
  આ໌ͷ୯७ԽͷͨΊʹৄࡉ͸লུ
  #JH2VFSZ QSFQSPDFTTPS
  EBUB@MPBEFS USBJO #JH2VFSZ
  தؒग़ྗ
  QSFEJDU
  தؒग़ྗ தؒग़ྗ

  View Slide

 6. ©2021 Wantedly, Inc.
  ໰୊఺ ։ൃ؀ڥͷ໰୊఺
  #JH2VFSZ QSFQSPDFTTPS
  EBUB@MPBEFS USBJO #JH2VFSZ
  தؒग़ྗ
  QSFEJDU
  தؒग़ྗ தؒग़ྗ
  w શମͷ࣮ߦ͕࣌ؒͱͯ΋௕͍
  w ։ൃதʹ్தͷεςοϓΛมߋ͢Δͱ࠷ॳ͔Β΍Γ௚͢ඞཁ
  ͕͋Δ
  • Ճ͑ͯɺPod ্Ͱ௕࣮࣌ؒߦͯ͠์͓ͬͯ͘ͱʢk8s ͷʣGC Ͱσʔλ͕ফ͑Δέʔε͕ଘࡏͨ͠

  View Slide

 7. ©2021 Wantedly, Inc.
  Ξϓϩʔν ։ൃ؀ڥͷ໰୊఺
  #JH2VFSZ QSFQSPDFTTPS
  EBUB@MPBEFS USBJO #JH2VFSZ
  தؒग़ྗ
  QSFEJDU
  தؒग़ྗ தؒग़ྗ
  w ֤εςοϓͷߴ଎Խࣦഊͨ͠
  w தؒग़ྗͷΩϟογϡԽ

  View Slide

 8. ©2021 Wantedly, Inc.
  தؒੜ੒෺ΛΩϟογϡͱͯ͠׆༻͢Δ͜ͱʹΑͬͯ
  ։ൃதʹ͓͚ΔσʔλύΠϓϥΠϯͷ
  ్த͔Βͷ࠶࣮ߦʹ͔͔Δ࣌ؒͱ֤εςοϓͷ࣮ߦස౓Λ࡟ݮ͢Δ
  ΰʔϧ ։ൃ؀ڥͷ໰୊఺

  View Slide

 9. ©2021 Wantedly, Inc.
  ୡ੒͢ΔͨΊʹ΍Δ͜ͱɾཁ݅ ։ൃ؀ڥͷ໰୊఺
  ඞཁͳதؒੜ੒෺͕ੜ੒ࡁΈͷεςοϓͷ࣮ߦΛεΩοϓ͢Δ


  • ظ଴ͨ͠ग़ྗ͕͢ͰʹߦΘΕ͍ͯͨΒͦͷεςοϓΛεΩοϓग़དྷΔ


  • தؒੜ੒෺Λར༻͢Δࡍ͸ࣗಈతʹద੾ͳ΋ͷΛબ୒ͯ͠ར༻ग़དྷΔΑ͏ʹ͢Δ


  • ຊ൪؀ڥ౳ΩϟογϡϛεΛආ͚͍ͨ؀ڥͰΩϟογϡΛແޮԽग़དྷΔ


  Ұ౓ੜ੒ͨ͠தؒੜ੒෺ΛʢPod ͷ࡟আ౳ʹΑͬͯʣฆࣦ͠ͳ͍Α͏ʹ͢Δ


  • ֤εςοϓऴྃ࣌ʹதؒੜ੒෺Λ GCS ͔Կ͔͠Βͷ֎෦ετϨʔδʹΞοϓϩʔυग़དྷΔ


  • ϩʔΧϧετϨʔδʹೖྗͱͯ͠ඞཁͳதؒੜ੒෺͕ଘࡏ͠ͳ͍ͱ͖ GCS ౳֎෦ετϨʔδʹଘࡏ
  ͢Δద੾ͳதؒੜ੒෺Λ୳ࡧͦ͠ΕΛೖྗͱͯ͠ར༻ग़དྷΔ

  View Slide

 10. ©2021 Wantedly, Inc.
  طଘπʔϧͷػೳͱ໰୊఺

  View Slide

 11. ©2021 Wantedly, Inc.
  طଘπʔϧ طଘπʔϧͷػೳͱ໰୊఺
  ࣮ݧ؅ཧతͳଆ໘͸ҎԼͷࢿྉ͕ࢀߟʹͳΓ·͢


  ୈ4ճ MLOps ษڧձ


  https://mlops.connpass.com/event/202359/


  Data Version Control ʹΑΔ࣮ݧ؅ཧͷ࣮຿Ͱͷద༻ࣄྫ https://speakerdeck.com/sansandsoc/an-experiment-management-example-by-data-version-control


  DSOC R&Dݚڀһ ߴڮ ׮࣏
  w %7$IUUQTEWDPSH
  w ػցֶशϓϩδΣΫτΛόʔδϣϯ؅ཧ͢ΔͨΊͷ$-*
  πʔϧ
  w ύΠϓϥΠϯͷ࣮ߦ؅ཧػೳؚ͕·ΕΔ

  View Slide

 12. ©2021 Wantedly, Inc.
  %7$ͷػೳ طଘπʔϧͷػೳͱ໰୊఺
  w ύΠϓϥΠϯͷ࣮ߦίϚϯυ
  w AEWDSFQSPA
  w ࣮ߦ͢ΔͨΊʹ͸ҎԼͷϑΝΠϧ͕ඞཁ
  w EWDZBNM
  w EWDMPDLʢͪ͜Β͸ίϚϯυ࣮ߦޙʹࣗಈੜ੒͞ΕΔʣ

  View Slide

 13. ©2021 Wantedly, Inc.
  ࣮ߦͷྲྀΕ
  EWDZBNM EWDMPDL
  AEWDSFQSPA
  ಡΈࠐΈ ࣮ߦɾੜ੒
  طଘπʔϧͷػೳͱ໰୊఺

  View Slide

 14. ©2021 Wantedly, Inc.
  %7$ͷػೳ طଘπʔϧͷػೳͱ໰୊఺
  EWDZBNM
  ύΠϓϥΠϯͰ࣮ߦ͢ΔίϚϯυͱͦͷग़ྗϑΝΠϧɺ֤
  εςʔδؒͷґଘؔ܎Λهड़͢Δ


  εςʔδؒͷґଘؔ܎Λεςʔδ໊Ͱ͸ͳ͘ɺϑΝΠϧ୯
  ҐͰදݱ͍ͯ͠Δͷ͕ಛ௃త

  View Slide

 15. ©2021 Wantedly, Inc.
  %7$ͷػೳ طଘπʔϧͷػೳͱ໰୊఺
  EWDMPDL
  ֤εςʔδͷग़ྗϑΝΠϧͱґଘ͍ͯ͠ΔϑΝΠ
  ϧͷϋογϡ஋ɾϑΝΠϧαΠζΛܭࢉͯ͠ه࿥
  ͢Δʢdvc ίϚϯυʹΑͬͯࣗಈੜ੒͞ΕΔʣ


  ϋογϡ஋ͱϑΝΠϧαΠζΛݩʹΩϟογϡ͕
  ୳ࡧ͞ΕΔʢϩʔΧϧ or ϦϞʔτʣ

  View Slide

 16. ©2021 Wantedly, Inc.
  ࣮ߦ͢Δ
  ࣮ߦ͢Δ
  ࣮ߦΛεΩοϓ͢Δ
  ࣮ߦͷྲྀΕ طଘπʔϧͷػೳͱ໰୊఺
  ճ໨ 4UBHF" 4UBHF;
  4UBHF#
  தؒग़ྗ" தؒग़ྗ# EWDMPDL
  ϑΝΠϧύεɾϋογϡ஋ɾϑΝΠϧαΠζ͕ه࿥͞ΕΔ
  /ճ໨ 4UBHF" 4UBHF;
  4UBHF#`
  தؒग़ྗ" தؒग़ྗ#`
  EWDMPDL
  ه࿥͞Εͨϋογϡ஋ɾϑΝΠϧαΠζͷҰகͱ
  தؒग़ྗ"ϑΝΠϧͷଘࡏΛ֬ೝ
  ه࿥ͷதʹҰக͢Δ΋ͷ͕ͳ͍
  ϑΝΠϧ͕ଘࡏ͠ͳ͍

  View Slide

 17. ©2021 Wantedly, Inc.
  தؒग़ྗͷύε؅ཧ͕େม طଘπʔϧͷػೳͱ໰୊఺
  ໰୊


  • ґଘؔ܎؅ཧͷͨΊʹϑΝΠϧύε͕େྔʹฒͿͨΊอकੑɾՄಡੑ͕͔ͳΓ௿͍


  ཁҼ


  • DVC ʹ͓͚Δεςʔδؒͷґଘؔ܎ͷදݱํ๏͕ґଘઌͷλεΫ໊Ͱ͸ͳ͘ɺ֤εςʔδʹ͓͚
  Δೖग़ྗϑΝΠϧ୯Ґʹͳ͍ͬͯΔ


  • εςʔδʹ͓͚Δґଘઌɾग़ྗϑΝΠϧΛ dvc.yaml ʹྻڍ͢Δඞཁ͕͋ΔͷͰɺґଘؔ܎Λ
  දݱ͔ͨͬͨ͠Βґଘ͞ΕΔεςʔδͷग़ྗϑΝΠϧϦετΛґଘ͢ΔεςʔδͷґଘϑΝΠ
  ϧϦετʹॏෳͯ͠ॻ͘ඞཁ͕͋Δ


  • શͯͷεςʔδͰڞ௨ͯ͠ґଘ͢ΔϑΝΠϧʢe.g. poetry.lockʣ͕͋Δ৔߹ɺશͯͷεςʔ
  δʹಉ͡΋ͷΛॻ͘ඞཁ͕͋Δ


  • dvc.yaml ͱίʔυதͰϑΝΠϧύεͷ੔߹ੑΛอͭඞཁੑ͕͋Δ


  • ࡉ੍͔͍ޚ͸΍Γ΍͍͢ͱ͍͏ϝϦοτ͸͋Δ͕զʑʹͱͬͯ͸ա৒

  View Slide

 18. ©2021 Wantedly, Inc.
  தؒग़ྗͷύε؅ཧ͕େม طଘπʔϧͷػೳͱ໰୊఺
  զʑʹͱͬͯա৒ͳϙΠϯτ


  • ґଘؔ܎ͷදݱํ๏͕λεΫ୯ҐͰ͸ͳ͘ɺϑΝΠϧ୯Ґʹͳ͍ͬͯΔ


  • ґଘઌͱग़ྗઌͰॏෳͯ͠ϑΝΠϧύεΛॻ͘ඞཁ͕͋Δ


  վળͰ͖ͦ͏ͳϙΠϯτ


  • ґଘؔ܎Λεςʔδ୯Ґʹ͍ͨ͠


  • ϑΝΠϧύεͷॏෳ؅ཧΛͨ͘͠ͳ͍


  • ॻ͘ͷ͸1Օॴ͚ͩʹ͍ͨ͠


  • ڞ௨ͯ͠ґଘ͢ΔϑΝΠϧ΋ॻ͘ͷ͸1Օॴ͚ͩʹ͍ͨ͠


  • dvc.yaml ͷݟ௨͠ͱΞΫηεੑΛվળ͍ͨ͠


  • dvc.yaml ͷՄಡੑΛ্͍͛ͨ


  • Python ίʔυ͔Β؆୯ʹΞΫηε͍ͨ͠

  View Slide

 19. ©2021 Wantedly, Inc.
  ΍ͬͨ͜ͱ

  View Slide

 20. ©2021 Wantedly, Inc.
  EWDͷ8SBQQFSπʔϧΛ࡞ͬͨ
  ίʔυδΣωϨʔλ


  • զʑͷ։ൃ؀ڥͰศརʹ࢖͏ͨΊͷઐ༻ Config ΛಡΈࠐΉ


  • dvc.yaml, stageouts.pyʢ͜ͷޙઆ໌ʣ Λੜ੒͢Δ


  ύΠϓϥΠϯͷ࣮ߦ


  • ύΠϓϥΠϯͷ࣮ߦࣗମ͸ dvc repro Λͦͷ··ར༻


  • GCS ͷೝূ


  • Ωϟογϡͷ Pull/Push ͷࣗಈԽ

  View Slide

 21. ©2021 Wantedly, Inc.
  σΟϨΫτϦߏ੒

  View Slide

 22. ©2021 Wantedly, Inc.
  ࣮ߦͷྲྀΕ
  ઐ༻$PO
  fi
  H
  EWDZBNM
  TUBHFPVUTQZ
  XSBQQFS AEWDSFQSPA
  ಡΈࠐΈ ੜ੒ ಡΈࠐΈ ࣮ߦɾੜ੒
  EWDMPDL

  View Slide

 23. ©2021 Wantedly, Inc.
  ίʔυδΣωϨʔλ
  ઐ༻$PO
  fi
  H
  dvc ͰύΠϓϥΠϯΛ࣮ߦ͢ΔͨΊʹඞཁͳ
  dvc.yaml ͱதؒग़ྗͷύε؅ཧΛߦ͏
  stageouts.py ͷੜ੒Λߦ͏ͨΊʹඞཁ


  ʢdvc.yaml ͔Βͷมߋ಺༰͸͋͘·Ͱզʑͷ։
  ൃ؀ڥʹ߹ΘͤΔͨΊͷ΋ͷʣ


  dvc.yaml ͷ໰୊఺Λվળ͢Δ


  • λεΫؒͷґଘؔ܎ͷએݴ


  • ڞ௨ͷґଘؔ܎ͷએݴ


  • ύε؅ཧͷάϧʔϐϯά

  View Slide

 24. ©2021 Wantedly, Inc.
  ίʔυδΣωϨʔλ
  TUBHFPVUTQZ
  தؒग़ྗઌͷ੔߹ੑΛ୲อ͠΍͘͢͢Δʢख
  ಈͰॏෳ؅ཧ͠ͳ͍ʣͨΊʹઐ༻ Config Ͱ
  ઃఆ֤ͨ͠εςʔδͷதؒग़ྗઌΛ Python
  ίʔυͱͯࣗ͠ಈੜ੒ͨ͠΋ͷ


  ֤εςʔδ͝ͱʹग़ྗઌϑΝΠϧύε͕࡞੒
  ͞Ε͍ͯΔͷͰɺimport ͯ͠࢖͏

  View Slide

 25. ©2021 Wantedly, Inc.
  ݁ՌͲ͏ͳ͔ͬͨ

  View Slide

 26. ©2021 Wantedly, Inc.
  ݁ՌͲ͏ͳ͔ͬͨ
  ౰ॳͷΰʔϧࣗମ͸ୡ੒ 🎉


  • ʮதؒੜ੒෺ΛΩϟογϡͱͯ͠׆༻͢Δ͜ͱʹΑͬͯɺ։ൃதʹ͓
  ͚ΔσʔλύΠϓϥΠϯͷ్த͔Βͷ࠶࣮ߦʹ͔͔Δ࣌ؒͱ֤εςο
  ϓͷ࣮ߦස౓Λ࡟ݮ͢Δʯ


  • શͯஔ͖׵͑Δͱ͜Ζ·Ͱ͸͍͍ͬͯͳ͍


  ෭࣍ޮՌ


  • ֤εςʔδʹ͓͚Δೖग़ྗ͕੔ཧ͞Εͨ

  View Slide

 27. ©2021 Wantedly, Inc.
  ݱঢ়ͷ՝୊఺

  View Slide

 28. ©2021 Wantedly, Inc.
  ݱঢ়ͷ՝୊఺
  • ΩϟογϡΛͪΌΜͱ༗ޮ׆༻͢ΔͨΊʹ͸͔ͳΓࡉ͔͘εςʔδΛ੾
  Δඞཁੑ͕͋Δ


  • εςʔδ͕੒ޭͨ͠ͱ͖ʹ͔͠Ωϟογϡ͕༗ޮʹͳΒͳ͍ʢdvc.lock ʹه࿥͞Εͳ͍ʣͷͰεςʔδ
  ಺Ͱࣦഊͨ͠৔߹͸ͦͷεςʔδΛؙ͝ͱ΍Γ௚͢ඞཁ͕͋Δ


  • ಈతʹ࡞੒͞ΕΔϑΝΠϧʹରԠͰ͖ͳ͍


  • εςʔδ׬ྃ࣌ʹ dvc.yaml ʹॻ͔ΕͨϑΝΠϧύε͕ଘࡏ͠ͳ͍ͱΤϥʔʹͳΔ


  • ྫ͑͹։ൃ؀ڥ͔ຊ൪؀ڥ͔ʹΑͬͯੜ੒͞ΕΔ͔Ͳ͏͔ܾఆ͞ΕΔϑΝΠϧ͕ґଘؔ܎ʹؚ·Ε͍ͯΔ
  ৔߹ɺ։ൃ؀ڥͰ͸੒ޭ͢Δ͕ຊ൪؀ڥͰ͸ࣦഊ͢Δͱ͍ͬͨΑ͏ͳ͜ͱ͕ى͜Γ͑Δ


  • ಋೖίετ͕ߴ͍


  • ίʔυΛ python ίϚϯυͰ࣮ߦͰ͖ΔϑΝΠϧ୯Ґʹ෼ׂ͢Δ࡞ۀ͕ඞཁʹͳΔ

  View Slide

 29. ©2021 Wantedly, Inc.
  ·ͱΊ
  WHY


  ύΠϓϥΠϯͷ࣮ߦ͕࣌ؒ௕ͯ͘։ൃੜ࢈ੑ͕མ͍ͪͯͨ


  WHAT


  தؒग़ྗΛΩϟογϡͱ্ͯ͠ख͘ѻͬͯɺ։ൃதͷ࣮ߦ࣌ؒͱස౓Λ࡟
  ݮͨ͠


  HOW


  DVC ͷ׆༻ͱ Wrapper πʔϧͷ࡞੒

  View Slide