メルカリのマーケット健全化施策を支えるML基盤

 メルカリのマーケット健全化施策を支えるML基盤

Transcript

  1. ϝϧΧϦͷϚʔέοτ݈શԽ ࢪࡦΛࢧ͑ΔMLج൫ Mercari ML Ops Night Vol.1
 
 hnakagawa


  2. ࣗݾ঺հ • Hirofumi Nakagawa (hnakagawa) • 2017೥7݄ೖࣾ • ॴଐ͸SRE •

    σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·Ͱ΍ΔԿͰ΋԰ • NOT MLΤϯδχΞ • https://github.com/hnakagawa
  3. ͓࢓ࣄ • ML Platform։ൃ • MLΤϯδχΞͱSREͷεΩϧΪϟοϓΛຒΊ Δ • ML Reliability,

    SysML?, MLOps? • SREͷཱ৔͔ΒMLγεςϜͷࣗಈԽΛߦ͏
  4. ML Platform • ಺੡ͷML Platform • kubernetesϕʔε • ϩʔΧϧ؀ڥͱΫϥελ؀ڥͷ ࠩΛந৅Խ͢Δ

    • ศརAPI܈ • طଘͷML FrameworkΛ࢖༻͠ ؆୯ʹTraining/ServingΛߦ͏ ؀ڥΛఏڙ
  5. ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ෼

  6. ࣄྫ ϦΞϧλΠϜ঎඼؂ࢹγεςϜ • ௨শ Lovemachine • ML Platform্ʹ࣮૷͞Ε͍ͯΔ .-1MBUGPSN USBJOJOHDMVTUFS

    -PWFNBDIJOF ($4 GKE PubSub .-1MBUGPSN TFSWJOHDMVTUFS -PWFNBDIJOF
  7. Model Training & Serving
 Workflow

  8. .-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC

    +PC ɾɾ 3&45
 "1* 4USFBNJOH 5' 4FSWJOH
 ɾɾɾ
  9. .-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1.

    GitHub΁ͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModel͸Model Registry
 ΁্͕Δ
  10. Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45
 "1* 4USFBNJOH 5'

    4FSWJOH
 ɾɾɾ 1. Model RegistryΛ؂ࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕੒ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
  11. Model Serving APIͷߏ੒ྫ 5FOTPS'MPX
 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL

    4,
 .PEFM 4,
 .PEFM 4,
 .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍
 ཪͷTensorFlow Servingʹ౤͍͛ͯΔ
  12. Model Serving API
 Streaming ver ͷߏ੒ྫ 5FOTPS'MPX
 4FSWJOH 5' .PEFM

    5' .PEFM .-1MBUGPSN 'SBNFXPSL
 PS
 "QBDIF#FBN
 4,
 .PEFM 4,
 .PEFM 4,
 .PEFM gRPC PubSub
  13. TensorFlow Serving • TensorFlow project͕ఏڙͯ͠ ͍ΔServing؀ڥ • PythonॲཧܥΛհͣ͞ʹTFͷ modelΛservingͰ͖Δ •

    ඪ४ͷ࣮૷Ͱ͸gRPCͰAPIΛ ఏڙ
  14. ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋Ε͹Կॲʹ഑ஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ •

    ྑ͍ΞΠσΟΞ͕͋Ε͹ڭ͑ͯԼ͍͞…
  15. ௨ৗͷAPIͱ͸ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ৔ ߹͕ଟ͍(਺ඦMBʙ਺GB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ৔߹ʹΑͬͯ͸GPU΋࢖͏

  16. ϝϞϦফඅ໰୊ • LovemachineͷPython࣮૷෦෼͸࣮ߦ࣌ʹ໿ 2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑Δ༧ ఆ΋͋Δ • Scikit-learnͰهड़͞ΕͨTF-IDF౳ͷલॲཧ෦ ෼͕େ͖͘ͳΔࣄ͕ଟ͍

  17. Pythonͱฒྻੑ • ౰વThread͕࢖͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green Deployͷো֐ʹͳΔ

  18. ਖ਼௚PythonͰͷServing͸
 Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…

  19. ϝϞϦΛݡ͘࢖͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per

    containerηΦϦ͸͋ ͑ͯഁ͍ͬͯΔ
  20. Copy On Writeͷ෮श ϝϞϦ ਌ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর

  21. ϓϩηε͕ϝϞϦͷ಺༰Λ
 ॻ͖׵͑Δͱ… ϝϞϦ ਌ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর

  22. Current Issues • ਓؒͷߦಈΛ૬खʹ͍ͯ͠Δҝɺσʔλͷ܏ ޲͕มΘΓ΍͔ͬͨ͢Γɺ༧૝֎ͷ໰୊͕ൃ ੜͨ͠Γͯ͠ɺରԠ͠ଓ͚Δඞཁ͕͋Δ
 ˠ ML Model࡞੒ऀʹෛ୲ֻ͕͔Γଓ͚Δ
 ˠ

    SREͱͯ͠͸ࣗಈԽΛؚΜͩ࢓૊ΈͰղܾ ͍ͨ͠
  23. In Progress • ࣾ಺ͷσʔλ͔ΒEmbedding͢Δ࣮૷Λίϯ ϙʔωϯτԽ • ಛఆͷ໰୊Λղܾ͢ΔϞσϧߏஙΛ͋Δఔ౓ ࣗಈԽ
 ˠࣾ಺ͷ໰୊ղܾʹಛԽͨ͠ઐ༻ͷAutoMLత ͳԿ͔

  24. AutoFlow(Ծ) 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJpDBUJPO $PNQPOFOUT $PODBUFOBUJPO
 $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT

    3FHJTUSZ Ϋϥελ্ͰϞσϧͷ൒ࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐ੔Λߦ͏
  25. ·ͱΊ • MLʹ͸গ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ
 ˠ·ͩϕετɾϓϥΫςΟε͸෼͔Βͳ͍ • ͦ΋ͦ΋MLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾ࢓૊ΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍

  26. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!

  27. We are Hiring!!

  28. SRE ML Reliability • SysML? MLOps? ৽͍͠Job description • SREεΩϧ+ML෼໺ͷجૅ஌ࣝ

    • MLΠϯϑϥͷࣗಈԽɾ࢓૊ΈԽΛਪ͠ਐΊͯ ͘ΕΔਓࡐ • ΋ͪΖΜଞͷ৬छ΋ઈࢍืूத!!
  29. ৄࡉ͸ͪ͜Β
 https://careers.mercari.com/