mlct.pdf
by
Hirofumi Nakagawa/中河 宏文
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
ϝϧΧϦͷMLج൫ MLCT vol.5 hnakagawa
Slide 2
Slide 2 text
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE • σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT σʔλαΠΤϯςΟετ • https://github.com/hnakagawa
Slide 3
Slide 3 text
͓ࣄ • ML Platform։ൃ • σʔλαΠΤϯςΟετͱSREͷεΩϧΪϟο ϓΛຒΊΔ • ML Reliability, SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
Slide 4
Slide 4 text
ML Platform • ͷML Platform • kubernetesϕʔε • طଘͷML FrameworkΛ༻͠ ؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
Slide 5
Slide 5 text
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
Slide 6
Slide 6 text
ϝϧΧϦͷMLར༻ࣄྫ • ײಈग़ • ҧग़ݕ • Ձ֨αδΣετ • ΤΠταδΣετ ʑ… ̍ઍສpredictionΛߦ͍ͬͯΔ
Slide 7
Slide 7 text
ML Platform Architecture ,VCFSOFUFT $POUSPMMFS $-* $MVTUFS8PSLGMPX %BTICPBSE 4UPSBHF(BUFXBZ .FUSJDT 3VOOFS $PNQPOFOU .FSDBSJ.- $PNQPOFOU &YUFSOBM .JEEMFXBSF
Slide 8
Slide 8 text
Model Training & Serving Workflow
Slide 9
Slide 9 text
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC +PC ɾɾ 3&45 "1* 4USFBNJOH 5'4FSW JOH ɾɾɾ
Slide 10
Slide 10 text
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1. GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Slide 11
Slide 11 text
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5' 4FSWJOH 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Slide 12
Slide 12 text
Container Workflow %BUB4PVSDF *NBHF 5FYUɹ 1SFQSPDFT TJOH *NBHF &TUJNBUPS *NBHF 17 17 1JDUVSF 1SFQSPDFT TJOH *NBHF 17 It’s own implementation
Slide 13
Slide 13 text
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL 4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Slide 14
Slide 14 text
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
Slide 15
Slide 15 text
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ • ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
Slide 16
Slide 16 text
௨ৗͷAPIͱಛੑ͕ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
Slide 17
Slide 17 text
ϝϞϦফඅ • ҧݕγεςϜͷPython࣮෦࣮ߦ࣌ ʹ2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑ Δ༧ఆ͋Δ • Scikit-learnͰهड़͞Εͨલॲཧ෦͕େ͖͘ ͳΓ͕ͪ
Slide 18
Slide 18 text
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
Slide 19
Slide 19 text
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
Slide 20
Slide 20 text
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Slide 21
Slide 21 text
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
Slide 22
Slide 22 text
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Slide 23
Slide 23 text
Current Issues
Slide 24
Slide 24 text
ߴͳܧଓతϝϯςφϯε͕ඞཁ • MLػೳσʔλͷ͕มΘͬͨΓɺ༧֎ ͷ͕ൃੜͨ͠Γͯ͠ɺͦΕΒʹରԠ͠ଓ ͚Δඞཁ͕͋Δ MLػೳϦϦʔεޙେ͖ͳ ίετ͕͔͔Γଓ͚Δ
Slide 25
Slide 25 text
େ෯ͳࣗಈԽ͕ඞਢ
Slide 26
Slide 26 text
In Progress
Slide 27
Slide 27 text
ߴͳࣗಈԽ • ࣾͷσʔλ͔ΒFeature Extraction͢Δ࣮ ΛίϯϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ • ϦϦʔεޙͷRe-TrainingɺHyper parameter optimizationɺDeployΛࣗಈԽ
Slide 28
Slide 28 text
AutoFlow 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJGJDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT 3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
Slide 29
Slide 29 text
AutoServing %FQMPZ ϦϦʔεޙͷਫ਼ࢹɾRe-TrainingɾRe-Deploy ΛࣗಈͰߦ͏ .POJUPSJOH &WBMVBUJPO )ZQFS QBSBNFUFS PQUJNJ[BUJPO 3F5SBJOJOH
Slide 30
Slide 30 text
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
Slide 31
Slide 31 text
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!