メルカリのマーケット健全化施策を支えるML基盤
by
Hirofumi Nakagawa/中河 宏文
×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
ϝϧΧϦͷϚʔέοτ݈શԽ ࢪࡦΛࢧ͑ΔMLج൫ Mercari ML Ops Night Vol.1 hnakagawa
Slide 2
Slide 2 text
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE • σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT MLΤϯδχΞ • https://github.com/hnakagawa
Slide 3
Slide 3 text
͓ࣄ • ML Platform։ൃ • MLΤϯδχΞͱSREͷεΩϧΪϟοϓΛຒΊ Δ • ML Reliability, SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
Slide 4
Slide 4 text
ML Platform • ͷML Platform • kubernetesϕʔε • ϩʔΧϧڥͱΫϥελڥͷ ࠩΛநԽ͢Δ • ศརAPI܈ • طଘͷML FrameworkΛ༻͠ ؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
Slide 5
Slide 5 text
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
Slide 6
Slide 6 text
ࣄྫ ϦΞϧλΠϜࢹγεςϜ • ௨শ Lovemachine • ML Platform্ʹ࣮͞Ε͍ͯΔ .-1MBUGPSN USBJOJOHDMVTUFS -PWFNBDIJOF ($4 GKE PubSub .-1MBUGPSN TFSWJOHDMVTUFS -PWFNBDIJOF
Slide 7
Slide 7 text
Model Training & Serving Workflow
Slide 8
Slide 8 text
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC +PC ɾɾ 3&45 "1* 4USFBNJOH 5' 4FSWJOH ɾɾɾ
Slide 9
Slide 9 text
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1. GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Slide 10
Slide 10 text
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5' 4FSWJOH ɾɾɾ 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Slide 11
Slide 11 text
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL 4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Slide 12
Slide 12 text
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
Slide 13
Slide 13 text
TensorFlow Serving • TensorFlow project͕ఏڙͯ͠ ͍ΔServingڥ • PythonॲཧܥΛհͣ͞ʹTFͷ modelΛservingͰ͖Δ • ඪ४ͷ࣮ͰgRPCͰAPIΛ ఏڙ
Slide 14
Slide 14 text
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ • ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
Slide 15
Slide 15 text
௨ৗͷAPIͱҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
Slide 16
Slide 16 text
ϝϞϦফඅ • LovemachineͷPython࣮෦࣮ߦ࣌ʹ 2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑Δ༧ ఆ͋Δ • Scikit-learnͰهड़͞ΕͨTF-IDFͷલॲཧ෦ ͕େ͖͘ͳΔࣄ͕ଟ͍
Slide 17
Slide 17 text
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
Slide 18
Slide 18 text
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
Slide 19
Slide 19 text
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Slide 20
Slide 20 text
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
Slide 21
Slide 21 text
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Slide 22
Slide 22 text
Current Issues • ਓؒͷߦಈΛ૬खʹ͍ͯ͠Δҝɺσʔλͷ ͕มΘΓ͔ͬͨ͢Γɺ༧֎ͷ͕ൃ ੜͨ͠Γͯ͠ɺରԠ͠ଓ͚Δඞཁ͕͋Δ ˠ ML Model࡞ऀʹෛ୲ֻ͕͔Γଓ͚Δ ˠ SREͱͯࣗ͠ಈԽΛؚΜͩΈͰղܾ ͍ͨ͠
Slide 23
Slide 23 text
In Progress • ࣾͷσʔλ͔ΒEmbedding͢Δ࣮Λίϯ ϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ ˠࣾͷղܾʹಛԽͨ͠ઐ༻ͷAutoMLత ͳԿ͔
Slide 24
Slide 24 text
AutoFlow(Ծ) 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJpDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT 3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
Slide 25
Slide 25 text
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
Slide 26
Slide 26 text
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!
Slide 27
Slide 27 text
We are Hiring!!
Slide 28
Slide 28 text
SRE ML Reliability • SysML? MLOps? ৽͍͠Job description • SREεΩϧ+MLͷجૅࣝ • MLΠϯϑϥͷࣗಈԽɾΈԽΛਪ͠ਐΊͯ ͘ΕΔਓࡐ • ͪΖΜଞͷ৬छઈࢍืूத!!
Slide 29
Slide 29 text
ৄࡉͪ͜Β https://careers.mercari.com/