ϝϧΧϦͷϚʔέοτ݈શԽࢪࡦΛࢧ͑ΔMLج൫Mercari ML Ops Night Vol.1 hnakagawa
View Slide
ࣗݾհ• Hirofumi Nakagawa (hnakagawa)• 20177݄ೖࣾ• ॴଐSRE• σόΠευϥΠό։ൃ͔ΒϑϩϯτΤϯυ։ൃ·ͰΔԿͰ• NOT MLΤϯδχΞ• https://github.com/hnakagawa
͓ࣄ• ML Platform։ൃ• MLΤϯδχΞͱSREͷεΩϧΪϟοϓΛຒΊΔ• ML Reliability, SysML?, MLOps?• SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
ML Platform• ͷML Platform• kubernetesϕʔε• ϩʔΧϧڥͱΫϥελڥͷࠩΛநԽ͢Δ• ศརAPI܈• طଘͷML FrameworkΛ༻͠؆୯ʹTraining/ServingΛߦ͏ڥΛఏڙ
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
ࣄྫ ϦΞϧλΠϜࢹγεςϜ• ௨শ Lovemachine• ML Platform্ʹ࣮͞Ε͍ͯΔ.-1MBUGPSN USBJOJOHDMVTUFS-PWFNBDIJOF($4GKEPubSub.-1MBUGPSN TFSWJOHDMVTUFS-PWFNBDIJOF
Model Training & Serving Workflow
.-1MBUGPSN USBJOJOHDMVTUFSWorkflow for Production$*.-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU.PEFM3FHJTUSZ+PC +PCɾɾ3&45 "1*4USFBNJOH5'4FSWJOH ɾɾɾ
.-1MBUGPSN USBJOJOHDMVTUFSTraining Workflow$*.PEFM3FHJTUSZ+PC +PC ɾɾɾ1. GitHubͷpushΛτϦΨʹtrainingΛىಈ2. Training͞ΕͨModelModel Registry ্͕Δ
Serving Workflow.-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU.PEFM3FHJTUSZ ɾɾ3&45 "1*4USFBNJOH5'4FSWJOH ɾɾɾ1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Model Serving APIͷߏྫ5FOTPS'MPX 4FSWJOH5'.PEFM5'.PEFM'MBTL4, .PEFM4, .PEFM4, .PEFMgRPC.FSDBSJ"1*RESTFlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Model Serving API Streaming ver ͷߏྫ5FOTPS'MPX 4FSWJOH5'.PEFM5'.PEFM.-1MBUGPSN'SBNFXPSL PS "QBDIF#FBN 4, .PEFM4, .PEFM4, .PEFMgRPCPubSub
TensorFlow Serving• TensorFlow project͕ఏڙ͍ͯ͠ΔServingڥ• PythonॲཧܥΛհͣ͞ʹTFͷmodelΛservingͰ͖Δ• ඪ४ͷ࣮ͰgRPCͰAPIΛఏڙ
ModelͱίϯςφɾΠϝʔδ• ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊΔ͔൱͔• ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔• ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ• ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
௨ৗͷAPIͱҧ͏• ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ߹͕ଟ͍(ඦMBʙGB)• CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠• ߹ʹΑͬͯGPU͏
ϝϞϦফඅ• LovemachineͷPython࣮෦࣮ߦ࣌ʹ2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑Δ༧ఆ͋Δ• Scikit-learnͰهड़͞ΕͨTF-IDFͷલॲཧ෦͕େ͖͘ͳΔࣄ͕ଟ͍
Pythonͱฒྻੑ• વThread͕͑ͳ͍(GILͷͨΊ)• ϓϩηεຖʹModelΛϩʔυ͢ΔͱඞཁͳϝϞϦαΠζ͕େ͖͘ͳΔˠ Blue-GreenDeployͷোʹͳΔ
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
ϝϞϦΛݡ͘͏• fork͢ΔલʹmodelΛϩʔυ͠Copy on WriteΛޮ͔͢• k8sͷone process per containerηΦϦ͋͑ͯഁ͍ͬͯΔ
Copy On Writeͷ෮शϝϞϦϓϩηε ࢠϓϩηε2.fork1BHF"1.allocation ಉ͡ྖҬΛࢀর
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ…ϝϞϦϓϩηε ࢠϓϩηε1BHF" 1BHF#OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢ΔผͷྖҬΛࢀর
Current Issues• ਓؒͷߦಈΛ૬खʹ͍ͯ͠Δҝɺσʔλͷ͕มΘΓ͔ͬͨ͢Γɺ༧֎ͷ͕ൃੜͨ͠Γͯ͠ɺରԠ͠ଓ͚Δඞཁ͕͋Δ ˠ ML Model࡞ऀʹෛ୲ֻ͕͔Γଓ͚Δ ˠ SREͱͯࣗ͠ಈԽΛؚΜͩΈͰղܾ͍ͨ͠
In Progress• ࣾͷσʔλ͔ΒEmbedding͢Δ࣮ΛίϯϙʔωϯτԽ• ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋ΔఔࣗಈԽ ˠࣾͷղܾʹಛԽͨ͠ઐ༻ͷAutoMLతͳԿ͔
AutoFlow(Ծ)'FBUVSF&YUSBDUJPO$PNQPOFOUT$MBTTJpDBUJPO$PNQPOFOUT$PODBUFOBUJPO $PNQPOFOUT.PEFM#VJMEFS$PNQPOFOUT3FHJTUSZΫϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥϝʔλͷࣗಈௐΛߦ͏
·ͱΊ• MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍• ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ख͘ߦ͔ͳ͍
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!
We are Hiring!!
SRE ML Reliability• SysML? MLOps? ৽͍͠Job description• SREεΩϧ+MLͷجૅࣝ• MLΠϯϑϥͷࣗಈԽɾΈԽΛਪ͠ਐΊͯ͘ΕΔਓࡐ• ͪΖΜଞͷ৬छઈࢍืूத!!
ৄࡉͪ͜Β https://careers.mercari.com/