Upgrade to Pro — share decks privately, control downloads, hide ads and more …

メルカリのマーケット健全化施策を支えるML基盤

 メルカリのマーケット健全化施策を支えるML基盤

More Decks by Hirofumi Nakagawa/中河 宏文

Other Decks in Programming

Transcript

  1. ϝϧΧϦͷϚʔέοτ݈શԽ
    ࢪࡦΛࢧ͑ΔMLج൫
    Mercari ML Ops Night Vol.1


    hnakagawa


    View Slide

  2. ࣗݾ঺հ
    • Hirofumi Nakagawa (hnakagawa)
    • 2017೥7݄ೖࣾ
    • ॴଐ͸SRE
    • σόΠευϥΠό։ൃ͔Βϑϩϯ
    τΤϯυ։ൃ·Ͱ΍ΔԿͰ΋԰
    • NOT MLΤϯδχΞ
    • https://github.com/hnakagawa

    View Slide

  3. ͓࢓ࣄ
    • ML Platform։ൃ
    • MLΤϯδχΞͱSREͷεΩϧΪϟοϓΛຒΊ
    Δ
    • ML Reliability, SysML?, MLOps?
    • SREͷཱ৔͔ΒMLγεςϜͷࣗಈԽΛߦ͏

    View Slide

  4. ML Platform
    • ಺੡ͷML Platform
    • kubernetesϕʔε
    • ϩʔΧϧ؀ڥͱΫϥελ؀ڥͷ
    ࠩΛந৅Խ͢Δ
    • ศརAPI܈
    • طଘͷML FrameworkΛ࢖༻͠
    ؆୯ʹTraining/ServingΛߦ͏
    ؀ڥΛఏڙ

    View Slide

  5. ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ෼

    View Slide

  6. ࣄྫ ϦΞϧλΠϜ঎඼؂ࢹγεςϜ
    • ௨শ Lovemachine
    • ML Platform্ʹ࣮૷͞Ε͍ͯΔ
    .-1MBUGPSN USBJOJOHDMVTUFS

    -PWFNBDIJOF
    ($4
    GKE
    PubSub
    .-1MBUGPSN TFSWJOHDMVTUFS

    -PWFNBDIJOF

    View Slide

  7. Model Training & Serving

    Workflow

    View Slide

  8. .-1MBUGPSN USBJOJOHDMVTUFS

    Workflow for Production
    $*
    .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU

    .PEFM3FHJTUSZ
    +PC +PC
    ɾɾ
    3&45

    "1*
    4USFBNJOH
    5'
    4FSWJOH

    ɾɾɾ

    View Slide

  9. .-1MBUGPSN USBJOJOHDMVTUFS

    Training Workflow
    $*
    .PEFM3FHJTUSZ
    +PC +PC ɾɾɾ
    1. GitHub΁ͷpushΛτϦΨʹtrainingΛىಈ
    2. Training͞ΕͨModel͸Model Registry

    ΁্͕Δ

    View Slide

  10. Serving Workflow
    .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU

    .PEFM3FHJTUSZ ɾɾ
    3&45

    "1*
    4USFBNJOH
    5'
    4FSWJOH

    ɾɾɾ
    1. Model RegistryΛ؂ࢹͯࣗ͠ಈͰModel ΛServing
    2. Serving&Test͕੒ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ

    View Slide

  11. Model Serving APIͷߏ੒ྫ
    5FOTPS'MPX

    4FSWJOH
    5'
    .PEFM
    5'
    .PEFM
    'MBTL
    4,

    .PEFM
    4,

    .PEFM
    4,

    .PEFM
    gRPC
    .FSDBSJ"1*
    REST
    FlaskͰલॲཧΛߦ͍

    ཪͷTensorFlow Servingʹ౤͍͛ͯΔ

    View Slide

  12. Model Serving API

    Streaming ver ͷߏ੒ྫ
    5FOTPS'MPX

    4FSWJOH
    5'
    .PEFM
    5'
    .PEFM
    .-1MBUGPSN
    'SBNFXPSL

    PS

    "QBDIF#FBN

    4,

    .PEFM
    4,

    .PEFM
    4,

    .PEFM
    gRPC
    PubSub

    View Slide

  13. TensorFlow Serving
    • TensorFlow project͕ఏڙͯ͠
    ͍ΔServing؀ڥ
    • PythonॲཧܥΛհͣ͞ʹTFͷ
    modelΛservingͰ͖Δ
    • ඪ४ͷ࣮૷Ͱ͸gRPCͰAPIΛ
    ఏڙ

    View Slide

  14. ModelͱίϯςφɾΠϝʔδ
    • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ
    Δ͔൱͔
    • ؚΊͳ͍ͷͰ͋Ε͹Կॲʹ഑ஔ͢Δ͔
    • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ
    • ྑ͍ΞΠσΟΞ͕͋Ε͹ڭ͑ͯԼ͍͞…

    View Slide

  15. ௨ৗͷAPIͱ͸ҧ͏
    • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ৔
    ߹͕ଟ͍(਺ඦMBʙ਺GB)
    • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠
    • ৔߹ʹΑͬͯ͸GPU΋࢖͏

    View Slide

  16. ϝϞϦফඅ໰୊
    • LovemachineͷPython࣮૷෦෼͸࣮ߦ࣌ʹ໿
    2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑Δ༧
    ఆ΋͋Δ
    • Scikit-learnͰهड़͞ΕͨTF-IDF౳ͷલॲཧ෦
    ෼͕େ͖͘ͳΔࣄ͕ଟ͍

    View Slide

  17. Pythonͱฒྻੑ
    • ౰વThread͕࢖͑ͳ͍(GILͷͨΊ)
    • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ
    ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green
    Deployͷো֐ʹͳΔ

    View Slide

  18. ਖ਼௚PythonͰͷServing͸

    Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…

    View Slide

  19. ϝϞϦΛݡ͘࢖͏
    • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write
    Λޮ͔͢
    • k8sͷone process per containerηΦϦ͸͋
    ͑ͯഁ͍ͬͯΔ

    View Slide

  20. Copy On Writeͷ෮श
    ϝϞϦ
    ਌ϓϩηε ࢠϓϩηε
    2.fork
    1BHF"
    1.allocation ಉ͡ྖҬΛࢀর

    View Slide

  21. ϓϩηε͕ϝϞϦͷ಺༰Λ

    ॻ͖׵͑Δͱ…
    ϝϞϦ
    ਌ϓϩηε ࢠϓϩηε
    1BHF" 1BHF#
    OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ
    ผͷྖҬΛࢀর

    View Slide

  22. Current Issues
    • ਓؒͷߦಈΛ૬खʹ͍ͯ͠Δҝɺσʔλͷ܏
    ޲͕มΘΓ΍͔ͬͨ͢Γɺ༧૝֎ͷ໰୊͕ൃ
    ੜͨ͠Γͯ͠ɺରԠ͠ଓ͚Δඞཁ͕͋Δ

    ˠ ML Model࡞੒ऀʹෛ୲ֻ͕͔Γଓ͚Δ

    ˠ SREͱͯ͠͸ࣗಈԽΛؚΜͩ࢓૊ΈͰղܾ
    ͍ͨ͠

    View Slide

  23. In Progress
    • ࣾ಺ͷσʔλ͔ΒEmbedding͢Δ࣮૷Λίϯ
    ϙʔωϯτԽ
    • ಛఆͷ໰୊Λղܾ͢ΔϞσϧߏஙΛ͋Δఔ౓
    ࣗಈԽ

    ˠࣾ಺ͷ໰୊ղܾʹಛԽͨ͠ઐ༻ͷAutoMLత
    ͳԿ͔

    View Slide

  24. AutoFlow(Ծ)
    'FBUVSF&YUSBDUJPO
    $PNQPOFOUT
    $MBTTJpDBUJPO
    $PNQPOFOUT
    $PODBUFOBUJPO

    $PNQPOFOUT
    .PEFM
    #VJMEFS
    $PNQPOFOUT
    3FHJTUSZ
    Ϋϥελ্ͰϞσϧͷ൒ࣗಈߏஙͱϋΠύʔύϥ
    ϝʔλͷࣗಈௐ੔Λߦ͏

    View Slide

  25. ·ͱΊ
    • MLʹ͸গ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ

    ˠ·ͩϕετɾϓϥΫςΟε͸෼͔Βͳ͍
    • ͦ΋ͦ΋MLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ
    ͱɺେ෯ͳࣗಈԽɾ࢓૊ΈԽΛਐΊͳ͍ͱ্
    ख͘ߦ͔ͳ͍

    View Slide

  26. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!

    View Slide

  27. We are Hiring!!

    View Slide

  28. SRE ML Reliability
    • SysML? MLOps? ৽͍͠Job description
    • SREεΩϧ+ML෼໺ͷجૅ஌ࣝ
    • MLΠϯϑϥͷࣗಈԽɾ࢓૊ΈԽΛਪ͠ਐΊͯ
    ͘ΕΔਓࡐ
    • ΋ͪΖΜଞͷ৬छ΋ઈࢍืूத!!

    View Slide

  29. ৄࡉ͸ͪ͜Β

    https://careers.mercari.com/

    View Slide