mlct.pdf

 mlct.pdf

Transcript

  1. ϝϧΧϦͷMLج൫ MLCT vol.5
 
 hnakagawa


  2. ࣗݾ঺հ • Hirofumi Nakagawa (hnakagawa) • 2017೥7݄ೖࣾ • ॴଐ͸SRE •

    σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·Ͱ΍ΔԿͰ΋԰ • NOT σʔλαΠΤϯςΟετ • https://github.com/hnakagawa
  3. ͓࢓ࣄ • ML Platform։ൃ • σʔλαΠΤϯςΟετͱSREͷεΩϧΪϟο ϓΛຒΊΔ • ML Reliability,

    SysML?, MLOps? • SREͷཱ৔͔ΒMLγεςϜͷࣗಈԽΛߦ͏
  4. ML Platform • ಺੡ͷML Platform • kubernetesϕʔε • طଘͷML FrameworkΛ࢖༻͠

    ؆୯ʹTraining/ServingΛߦ͏ ؀ڥΛఏڙ
  5. ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ෼

  6. ϝϧΧϦͷMLར༻ࣄྫ • ײಈग़඼ • ҧ൓ग़඼ݕ஌ • Ձ֨αδΣετ • ΢ΤΠταδΣετ
 ౳ʑ…

    ̍೔਺ઍສpredictionΛߦ͍ͬͯΔ
  7. ML Platform Architecture ,VCFSOFUFT $POUSPMMFS $-* $MVTUFS8PSLGMPX %BTICPBSE 4UPSBHF(BUFXBZ .FUSJDT

    3VOOFS $PNQPOFOU .FSDBSJ.- $PNQPOFOU &YUFSOBM .JEEMFXBSF
  8. Model Training & Serving
 Workflow

  9. .-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC

    +PC ɾɾ 3&45
 "1* 4USFBNJOH 5'4FSW JOH ɾɾɾ
  10. .-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1.

    GitHub΁ͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModel͸Model Registry
 ΁্͕Δ
  11. Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45
 "1* 4USFBNJOH 5'

    4FSWJOH 1. Model RegistryΛ؂ࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕੒ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
  12. Container Workflow %BUB4PVSDF
 *NBHF 5FYUɹ 1SFQSPDFT TJOH *NBHF &TUJNBUPS
 *NBHF

    17 17 1JDUVSF 1SFQSPDFT TJOH *NBHF 17 It’s own implementation
  13. Model Serving APIͷߏ੒ྫ 5FOTPS'MPX
 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL

    4,
 .PEFM 4,
 .PEFM 4,
 .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍
 ཪͷTensorFlow Servingʹ౤͍͛ͯΔ
  14. Model Serving API
 Streaming ver ͷߏ੒ྫ 5FOTPS'MPX
 4FSWJOH 5' .PEFM

    5' .PEFM .-1MBUGPSN 'SBNFXPSL
 PS
 "QBDIF#FBN
 4,
 .PEFM 4,
 .PEFM 4,
 .PEFM gRPC PubSub
  15. ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋Ε͹Կॲʹ഑ஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ •

    ྑ͍ΞΠσΟΞ͕͋Ε͹ڭ͑ͯԼ͍͞…
  16. ௨ৗͷAPIͱಛੑ͕ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ৔ ߹͕ଟ͍(਺ඦMBʙ਺GB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ৔߹ʹΑͬͯ͸GPU΋࢖͏

  17. ϝϞϦফඅ໰୊ • ҧ൓ݕ஌γεςϜͷPython࣮૷෦෼͸࣮ߦ࣌ ʹ໿2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑ Δ༧ఆ΋͋Δ • Scikit-learnͰهड़͞Εͨલॲཧ෦෼͕େ͖͘ ͳΓ͕ͪ

  18. Pythonͱฒྻੑ • ౰વThread͕࢖͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green Deployͷো֐ʹͳΔ

  19. ਖ਼௚PythonͰͷServing͸
 Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…

  20. ϝϞϦΛݡ͘࢖͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per

    containerηΦϦ͸͋ ͑ͯഁ͍ͬͯΔ
  21. Copy On Writeͷ෮श ϝϞϦ ਌ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর

  22. ϓϩηε͕ϝϞϦͷ಺༰Λ
 ॻ͖׵͑Δͱ… ϝϞϦ ਌ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর

  23. Current Issues

  24. ߴ౓ͳܧଓతϝϯςφϯε͕ඞཁ • MLػೳ͸σʔλͷ܏޲͕มΘͬͨΓɺ༧૝֎ ͷ໰୊͕ൃੜͨ͠Γͯ͠ɺͦΕΒʹରԠ͠ଓ ͚Δඞཁ͕͋Δ MLػೳ͸ϦϦʔεޙ΋େ͖ͳ ίετ͕͔͔Γଓ͚Δ

  25. େ෯ͳࣗಈԽ͕ඞਢ

  26. In Progress

  27. ߴ౓ͳࣗಈԽ • ࣾ಺ͷσʔλ͔ΒFeature Extraction͢Δ࣮૷ ΛίϯϙʔωϯτԽ • ಛఆͷ໰୊Λղܾ͢ΔϞσϧߏஙΛ͋Δఔ౓ ࣗಈԽ • ϦϦʔεޙͷRe-TrainingɺHyper

    parameter optimizationɺDeploy౳ΛࣗಈԽ
  28. AutoFlow 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJGJDBUJPO $PNQPOFOUT $PODBUFOBUJPO
 $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT

    3FHJTUSZ Ϋϥελ্ͰϞσϧͷ൒ࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐ੔Λߦ͏
  29. AutoServing %FQMPZ ϦϦʔεޙͷਫ਼౓؂ࢹɾRe-TrainingɾRe-Deploy౳ ΛࣗಈͰߦ͏ .POJUPSJOH &WBMVBUJPO )ZQFS QBSBNFUFS PQUJNJ[BUJPO 3F5SBJOJOH

  30. ·ͱΊ • MLʹ͸গ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ
 ˠ·ͩϕετɾϓϥΫςΟε͸෼͔Βͳ͍ • ͦ΋ͦ΋MLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾ࢓૊ΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍

  31. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!