Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DVC を活用した機械学習パイプライン開発の高速化 / Using DVC to accele...
Search
Takayuki Kasai
June 16, 2021
3
2.8k
DVC を活用した機械学習パイプライン開発の高速化 / Using DVC to accelerate machine learning pipeline development
第8回 MLOps 勉強会 Tokyo (Online)
https://mlops.connpass.com/event/211953/
Takayuki Kasai
June 16, 2021
Tweet
Share
More Decks by Takayuki Kasai
See All by Takayuki Kasai
自作 Controller による Secret の配布と収集 / Distributing and collecting secrets with self-made controller
unblee
4
2k
Kubernetes の API Client における キャッシュ設計 / Cache Design in Kubernetes API Client
unblee
6
4.1k
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
334
57k
Fashionably flexible responsive web design (full day workshop)
malarkey
404
65k
Designing the Hi-DPI Web
ddemaree
280
34k
Navigating Team Friction
lara
183
14k
Why You Should Never Use an ORM
jnunemaker
PRO
53
9k
Code Reviewing Like a Champion
maltzj
519
39k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
7
150
Bash Introduction
62gerente
608
210k
RailsConf 2023
tenderlove
29
880
Measuring & Analyzing Core Web Vitals
bluesmoon
1
39
Imperfection Machines: The Place of Print at Facebook
scottboms
264
13k
Building Your Own Lightsaber
phodgson
102
6k
Transcript
©2021 Wantedly, Inc. %7$Λ׆༻ͨ͠ ػցֶशύΠϓϥΠϯ։ൃͷߴԽ ୈ8ճ MLOps ษڧձ Tokyo (Online)
Takayuki Kasai (@unblee) Wantedly, Inc. 2021.6.16
©2021 Wantedly, Inc. ͢͜ͱ 8BOUFEMZ7JTJUͷ.BUDIJOHνʔϜͷ ػցֶशύΠϓϥΠϯ։ൃʹ͓͍ͯൃੜ͍ͯͨ͠ͱ ͦΕΛղܾ͢ΔͨΊʹ࠾ͬͨΞϓϩʔνʹ͍ͭͯհ͠·͢
©2021 Wantedly, Inc. ։ൃڥͷ طଘπʔϧͷػೳͱ ͬͨ͜ͱ
݁ՌͲ͏ͳ͔ͬͨ ݱঢ়ͷ՝ ͢͜ͱ
©2021 Wantedly, Inc. ։ൃڥͷ
©2021 Wantedly, Inc. ػցֶशύΠϓϥΠϯ ։ൃڥͷ w ύΠϓϥΠϯͷྲྀΕࣗମγϯϓϧ w ։ൃओʹLT্ͷ1PEͰߦ͍ͬͯΔ •
ຊ൪ͱ։ൃڥͷ࠶ݱੑͷ୲อͱϦιʔεʢओʹϝϞϦʣΛεέʔϧͤ͞ΔͨΊ w ୯ମͰϫʔΫϑϩʔΤϯδϯ͍ͬͯͳ͍ • ϚΠΫϩαʔϏεؒΛ·͙ͨґଘؔΛ͍࣋ͬͯΔ߹ Argo Workflows Λར༻͍ͯ͠Δ આ໌ͷ୯७ԽͷͨΊʹৄࡉলུ #JH2VFSZ QSFQSPDFTTPS EBUB@MPBEFS USBJO #JH2VFSZ தؒग़ྗ QSFEJDU தؒग़ྗ தؒग़ྗ
©2021 Wantedly, Inc. ։ൃڥͷ #JH2VFSZ QSFQSPDFTTPS EBUB@MPBEFS USBJO #JH2VFSZ
தؒग़ྗ QSFEJDU தؒग़ྗ தؒग़ྗ w શମͷ࣮ߦ͕࣌ؒͱ͍ͯ w ։ൃதʹ్தͷεςοϓΛมߋ͢Δͱ࠷ॳ͔ΒΓ͢ඞཁ ͕͋Δ • Ճ͑ͯɺPod ্Ͱ࣮࣌ؒߦͯ͠์͓ͬͯ͘ͱʢk8s ͷʣGC Ͱσʔλ͕ফ͑Δέʔε͕ଘࡏͨ͠
©2021 Wantedly, Inc. Ξϓϩʔν ։ൃڥͷ #JH2VFSZ QSFQSPDFTTPS EBUB@MPBEFS USBJO #JH2VFSZ
தؒग़ྗ QSFEJDU தؒग़ྗ தؒग़ྗ w ֤εςοϓͷߴԽࣦഊͨ͠ w தؒग़ྗͷΩϟογϡԽ
©2021 Wantedly, Inc. தؒੜΛΩϟογϡͱͯ͠׆༻͢Δ͜ͱʹΑͬͯ ։ൃதʹ͓͚ΔσʔλύΠϓϥΠϯͷ ్த͔Βͷ࠶࣮ߦʹ͔͔Δ࣌ؒͱ֤εςοϓͷ࣮ߦසΛݮ͢Δ ΰʔϧ ։ൃڥͷ
©2021 Wantedly, Inc. ୡ͢ΔͨΊʹΔ͜ͱɾཁ݅ ։ൃڥͷ ඞཁͳதؒੜ͕ੜࡁΈͷεςοϓͷ࣮ߦΛεΩοϓ͢Δ • ظͨ͠ग़ྗ͕͢ͰʹߦΘΕ͍ͯͨΒͦͷεςοϓΛεΩοϓग़དྷΔ • தؒੜΛར༻͢ΔࡍࣗಈతʹదͳͷΛબͯ͠ར༻ग़དྷΔΑ͏ʹ͢Δ
• ຊ൪ڥΩϟογϡϛεΛආ͚͍ͨڥͰΩϟογϡΛແޮԽग़དྷΔ Ұੜͨ͠தؒੜΛʢPod ͷআʹΑͬͯʣฆࣦ͠ͳ͍Α͏ʹ͢Δ • ֤εςοϓऴྃ࣌ʹதؒੜΛ GCS ͔Կ͔͠Βͷ֎෦ετϨʔδʹΞοϓϩʔυग़དྷΔ • ϩʔΧϧετϨʔδʹೖྗͱͯ͠ඞཁͳதؒੜ͕ଘࡏ͠ͳ͍ͱ͖ GCS ֎෦ετϨʔδʹଘࡏ ͢ΔదͳதؒੜΛ୳ࡧͦ͠ΕΛೖྗͱͯ͠ར༻ग़དྷΔ
©2021 Wantedly, Inc. طଘπʔϧͷػೳͱ
©2021 Wantedly, Inc. طଘπʔϧ طଘπʔϧͷػೳͱ ࣮ݧཧతͳଆ໘ҎԼͷࢿྉ͕ࢀߟʹͳΓ·͢ ୈ4ճ MLOps ษڧձ https://mlops.connpass.com/event/202359/
Data Version Control ʹΑΔ࣮ݧཧͷ࣮Ͱͷద༻ࣄྫ https://speakerdeck.com/sansandsoc/an-experiment-management-example-by-data-version-control DSOC R&Dݚڀһ ߴڮ ࣏ w %7$IUUQTEWDPSH w ػցֶशϓϩδΣΫτΛόʔδϣϯཧ͢ΔͨΊͷ$-* πʔϧ w ύΠϓϥΠϯͷ࣮ߦཧػೳؚ͕·ΕΔ
©2021 Wantedly, Inc. %7$ͷػೳ طଘπʔϧͷػೳͱ w ύΠϓϥΠϯͷ࣮ߦίϚϯυ w AEWDSFQSPA w
࣮ߦ͢ΔͨΊʹҎԼͷϑΝΠϧ͕ඞཁ w EWDZBNM w EWDMPDLʢͪ͜ΒίϚϯυ࣮ߦޙʹࣗಈੜ͞ΕΔʣ
©2021 Wantedly, Inc. ࣮ߦͷྲྀΕ EWDZBNM EWDMPDL AEWDSFQSPA ಡΈࠐΈ ࣮ߦɾੜ طଘπʔϧͷػೳͱ
©2021 Wantedly, Inc. %7$ͷػೳ طଘπʔϧͷػೳͱ EWDZBNM ύΠϓϥΠϯͰ࣮ߦ͢ΔίϚϯυͱͦͷग़ྗϑΝΠϧɺ֤ εςʔδؒͷґଘؔΛهड़͢Δ εςʔδؒͷґଘؔΛεςʔδ໊Ͱͳ͘ɺϑΝΠϧ୯ ҐͰදݱ͍ͯ͠Δͷ͕ಛత
©2021 Wantedly, Inc. %7$ͷػೳ طଘπʔϧͷػೳͱ EWDMPDL ֤εςʔδͷग़ྗϑΝΠϧͱґଘ͍ͯ͠ΔϑΝΠ ϧͷϋογϡɾϑΝΠϧαΠζΛܭࢉͯ͠ه ͢Δʢdvc ίϚϯυʹΑͬͯࣗಈੜ͞ΕΔʣ
ϋογϡͱϑΝΠϧαΠζΛݩʹΩϟογϡ͕ ୳ࡧ͞ΕΔʢϩʔΧϧ or ϦϞʔτʣ
©2021 Wantedly, Inc. ࣮ߦ͢Δ ࣮ߦ͢Δ ࣮ߦΛεΩοϓ͢Δ ࣮ߦͷྲྀΕ طଘπʔϧͷػೳͱ ճ 4UBHF"
4UBHF; 4UBHF# தؒग़ྗ" தؒग़ྗ# EWDMPDL ϑΝΠϧύεɾϋογϡɾϑΝΠϧαΠζ͕ه͞ΕΔ /ճ 4UBHF" 4UBHF; 4UBHF#` தؒग़ྗ" தؒग़ྗ#` EWDMPDL ه͞ΕͨϋογϡɾϑΝΠϧαΠζͷҰகͱ தؒग़ྗ"ϑΝΠϧͷଘࡏΛ֬ೝ هͷதʹҰக͢Δͷ͕ͳ͍ ϑΝΠϧ͕ଘࡏ͠ͳ͍
©2021 Wantedly, Inc. தؒग़ྗͷύεཧ͕େม طଘπʔϧͷػೳͱ • ґଘؔཧͷͨΊʹϑΝΠϧύε͕େྔʹฒͿͨΊอकੑɾՄಡੑ͕͔ͳΓ͍ ཁҼ •
DVC ʹ͓͚Δεςʔδؒͷґଘؔͷදݱํ๏͕ґଘઌͷλεΫ໊Ͱͳ͘ɺ֤εςʔδʹ͓͚ Δೖग़ྗϑΝΠϧ୯Ґʹͳ͍ͬͯΔ • εςʔδʹ͓͚Δґଘઌɾग़ྗϑΝΠϧΛ dvc.yaml ʹྻڍ͢Δඞཁ͕͋ΔͷͰɺґଘؔΛ දݱ͔ͨͬͨ͠Βґଘ͞ΕΔεςʔδͷग़ྗϑΝΠϧϦετΛґଘ͢ΔεςʔδͷґଘϑΝΠ ϧϦετʹॏෳͯ͠ॻ͘ඞཁ͕͋Δ • શͯͷεςʔδͰڞ௨ͯ͠ґଘ͢ΔϑΝΠϧʢe.g. poetry.lockʣ͕͋Δ߹ɺશͯͷεςʔ δʹಉ͡ͷΛॻ͘ඞཁ͕͋Δ • dvc.yaml ͱίʔυதͰϑΝΠϧύεͷ߹ੑΛอͭඞཁੑ͕͋Δ • ࡉ੍͔͍ޚΓ͍͢ͱ͍͏ϝϦοτ͋Δ͕զʑʹͱͬͯա
©2021 Wantedly, Inc. தؒग़ྗͷύεཧ͕େม طଘπʔϧͷػೳͱ զʑʹͱͬͯաͳϙΠϯτ • ґଘؔͷදݱํ๏͕λεΫ୯ҐͰͳ͘ɺϑΝΠϧ୯Ґʹͳ͍ͬͯΔ • ґଘઌͱग़ྗઌͰॏෳͯ͠ϑΝΠϧύεΛॻ͘ඞཁ͕͋Δ
վળͰ͖ͦ͏ͳϙΠϯτ • ґଘؔΛεςʔδ୯Ґʹ͍ͨ͠ • ϑΝΠϧύεͷॏෳཧΛͨ͘͠ͳ͍ • ॻ͘ͷ1Օॴ͚ͩʹ͍ͨ͠ • ڞ௨ͯ͠ґଘ͢ΔϑΝΠϧॻ͘ͷ1Օॴ͚ͩʹ͍ͨ͠ • dvc.yaml ͷݟ௨͠ͱΞΫηεੑΛվળ͍ͨ͠ • dvc.yaml ͷՄಡੑΛ্͍͛ͨ • Python ίʔυ͔Β؆୯ʹΞΫηε͍ͨ͠
©2021 Wantedly, Inc. ͬͨ͜ͱ
©2021 Wantedly, Inc. EWDͷ8SBQQFSπʔϧΛ࡞ͬͨ ίʔυδΣωϨʔλ • զʑͷ։ൃڥͰศརʹ͏ͨΊͷઐ༻ Config ΛಡΈࠐΉ •
dvc.yaml, stageouts.pyʢ͜ͷޙઆ໌ʣ Λੜ͢Δ ύΠϓϥΠϯͷ࣮ߦ • ύΠϓϥΠϯͷ࣮ߦࣗମ dvc repro Λͦͷ··ར༻ • GCS ͷೝূ • Ωϟογϡͷ Pull/Push ͷࣗಈԽ
©2021 Wantedly, Inc. σΟϨΫτϦߏ
©2021 Wantedly, Inc. ࣮ߦͷྲྀΕ ઐ༻$PO fi H EWDZBNM TUBHFPVUTQZ XSBQQFS
AEWDSFQSPA ಡΈࠐΈ ੜ ಡΈࠐΈ ࣮ߦɾੜ EWDMPDL
©2021 Wantedly, Inc. ίʔυδΣωϨʔλ ઐ༻$PO fi H dvc ͰύΠϓϥΠϯΛ࣮ߦ͢ΔͨΊʹඞཁͳ dvc.yaml
ͱதؒग़ྗͷύεཧΛߦ͏ stageouts.py ͷੜΛߦ͏ͨΊʹඞཁ ʢdvc.yaml ͔Βͷมߋ༰͋͘·Ͱզʑͷ։ ൃڥʹ߹ΘͤΔͨΊͷͷʣ dvc.yaml ͷΛվળ͢Δ • λεΫؒͷґଘؔͷએݴ • ڞ௨ͷґଘؔͷએݴ • ύεཧͷάϧʔϐϯά
©2021 Wantedly, Inc. ίʔυδΣωϨʔλ TUBHFPVUTQZ தؒग़ྗઌͷ߹ੑΛ୲อ͘͢͢͠Δʢख ಈͰॏෳཧ͠ͳ͍ʣͨΊʹઐ༻ Config Ͱ ઃఆ֤ͨ͠εςʔδͷதؒग़ྗઌΛ
Python ίʔυͱͯࣗ͠ಈੜͨ͠ͷ ֤εςʔδ͝ͱʹग़ྗઌϑΝΠϧύε͕࡞ ͞Ε͍ͯΔͷͰɺimport ͯ͠͏
©2021 Wantedly, Inc. ݁ՌͲ͏ͳ͔ͬͨ
©2021 Wantedly, Inc. ݁ՌͲ͏ͳ͔ͬͨ ॳͷΰʔϧࣗମୡ 🎉 • ʮதؒੜΛΩϟογϡͱͯ͠׆༻͢Δ͜ͱʹΑͬͯɺ։ൃதʹ͓ ͚ΔσʔλύΠϓϥΠϯͷ్த͔Βͷ࠶࣮ߦʹ͔͔Δ࣌ؒͱ֤εςο ϓͷ࣮ߦසΛݮ͢Δʯ
• શͯஔ͖͑Δͱ͜Ζ·Ͱ͍͍ͬͯͳ͍ ෭࣍ޮՌ • ֤εςʔδʹ͓͚Δೖग़ྗ͕ཧ͞Εͨ
©2021 Wantedly, Inc. ݱঢ়ͷ՝
©2021 Wantedly, Inc. ݱঢ়ͷ՝ • ΩϟογϡΛͪΌΜͱ༗ޮ׆༻͢ΔͨΊʹ͔ͳΓࡉ͔͘εςʔδΛ Δඞཁੑ͕͋Δ • εςʔδ͕ޭͨ͠ͱ͖ʹ͔͠Ωϟογϡ͕༗ޮʹͳΒͳ͍ʢdvc.lock ʹه͞Εͳ͍ʣͷͰεςʔδ
Ͱࣦഊͨ͠߹ͦͷεςʔδΛؙ͝ͱΓ͢ඞཁ͕͋Δ • ಈతʹ࡞͞ΕΔϑΝΠϧʹରԠͰ͖ͳ͍ • εςʔδྃ࣌ʹ dvc.yaml ʹॻ͔ΕͨϑΝΠϧύε͕ଘࡏ͠ͳ͍ͱΤϥʔʹͳΔ • ྫ͑։ൃڥ͔ຊ൪ڥ͔ʹΑͬͯੜ͞ΕΔ͔Ͳ͏͔ܾఆ͞ΕΔϑΝΠϧ͕ґଘؔʹؚ·Ε͍ͯΔ ߹ɺ։ൃڥͰޭ͢Δ͕ຊ൪ڥͰࣦഊ͢Δͱ͍ͬͨΑ͏ͳ͜ͱ͕ى͜Γ͑Δ • ಋೖίετ͕ߴ͍ • ίʔυΛ python ίϚϯυͰ࣮ߦͰ͖ΔϑΝΠϧ୯Ґʹׂ͢Δ࡞ۀ͕ඞཁʹͳΔ
©2021 Wantedly, Inc. ·ͱΊ WHY ύΠϓϥΠϯͷ࣮ߦ͕࣌ؒͯ͘։ൃੜ࢈ੑ͕མ͍ͪͯͨ WHAT தؒग़ྗΛΩϟογϡͱ্ͯ͠ख͘ѻͬͯɺ։ൃதͷ࣮ߦ࣌ؒͱසΛ ݮͨ͠ HOW
DVC ͷ׆༻ͱ Wrapper πʔϧͷ࡞