メルカリのマーケット健全化施策を支えるML基盤
by
Hirofumi Nakagawa/中河 宏文
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
ϝϧΧϦͷϚʔέοτ݈શԽ ࢪࡦΛࢧ͑ΔMLج൫ Mercari ML Ops Night Vol.1 hnakagawa
Slide 2
Slide 2 text
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE • σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT MLΤϯδχΞ • https://github.com/hnakagawa
Slide 3
Slide 3 text
͓ࣄ • ML Platform։ൃ • MLΤϯδχΞͱSREͷεΩϧΪϟοϓΛຒΊ Δ • ML Reliability, SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
Slide 4
Slide 4 text
ML Platform • ͷML Platform • kubernetesϕʔε • ϩʔΧϧڥͱΫϥελڥͷ ࠩΛநԽ͢Δ • ศརAPI܈ • طଘͷML FrameworkΛ༻͠ ؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
Slide 5
Slide 5 text
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
Slide 6
Slide 6 text
ࣄྫ ϦΞϧλΠϜࢹγεςϜ • ௨শ Lovemachine • ML Platform্ʹ࣮͞Ε͍ͯΔ .-1MBUGPSN USBJOJOHDMVTUFS -PWFNBDIJOF ($4 GKE PubSub .-1MBUGPSN TFSWJOHDMVTUFS -PWFNBDIJOF
Slide 7
Slide 7 text
Model Training & Serving Workflow
Slide 8
Slide 8 text
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC +PC ɾɾ 3&45 "1* 4USFBNJOH 5' 4FSWJOH ɾɾɾ
Slide 9
Slide 9 text
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1. GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Slide 10
Slide 10 text
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5' 4FSWJOH ɾɾɾ 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Slide 11
Slide 11 text
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL 4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Slide 12
Slide 12 text
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
Slide 13
Slide 13 text
TensorFlow Serving • TensorFlow project͕ఏڙͯ͠ ͍ΔServingڥ • PythonॲཧܥΛհͣ͞ʹTFͷ modelΛservingͰ͖Δ • ඪ४ͷ࣮ͰgRPCͰAPIΛ ఏڙ
Slide 14
Slide 14 text
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ • ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
Slide 15
Slide 15 text
௨ৗͷAPIͱҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
Slide 16
Slide 16 text
ϝϞϦফඅ • LovemachineͷPython࣮෦࣮ߦ࣌ʹ 2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑Δ༧ ఆ͋Δ • Scikit-learnͰهड़͞ΕͨTF-IDFͷલॲཧ෦ ͕େ͖͘ͳΔࣄ͕ଟ͍
Slide 17
Slide 17 text
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
Slide 18
Slide 18 text
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
Slide 19
Slide 19 text
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Slide 20
Slide 20 text
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
Slide 21
Slide 21 text
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Slide 22
Slide 22 text
Current Issues • ਓؒͷߦಈΛ૬खʹ͍ͯ͠Δҝɺσʔλͷ ͕มΘΓ͔ͬͨ͢Γɺ༧֎ͷ͕ൃ ੜͨ͠Γͯ͠ɺରԠ͠ଓ͚Δඞཁ͕͋Δ ˠ ML Model࡞ऀʹෛ୲ֻ͕͔Γଓ͚Δ ˠ SREͱͯࣗ͠ಈԽΛؚΜͩΈͰղܾ ͍ͨ͠
Slide 23
Slide 23 text
In Progress • ࣾͷσʔλ͔ΒEmbedding͢Δ࣮Λίϯ ϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ ˠࣾͷղܾʹಛԽͨ͠ઐ༻ͷAutoMLత ͳԿ͔
Slide 24
Slide 24 text
AutoFlow(Ծ) 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJpDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT 3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
Slide 25
Slide 25 text
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
Slide 26
Slide 26 text
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!
Slide 27
Slide 27 text
We are Hiring!!
Slide 28
Slide 28 text
SRE ML Reliability • SysML? MLOps? ৽͍͠Job description • SREεΩϧ+MLͷجૅࣝ • MLΠϯϑϥͷࣗಈԽɾΈԽΛਪ͠ਐΊͯ ͘ΕΔਓࡐ • ͪΖΜଞͷ৬छઈࢍืूத!!
Slide 29
Slide 29 text
ৄࡉͪ͜Β https://careers.mercari.com/