Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
mlct.pdf
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Hirofumi Nakagawa/中河 宏文
July 23, 2018
Programming
2.1k
2
Share
mlct.pdf
Hirofumi Nakagawa/中河 宏文
July 23, 2018
More Decks by Hirofumi Nakagawa/中河 宏文
See All by Hirofumi Nakagawa/中河 宏文
IoTデバイスでMLモデルを動かす技術
hnakagawa
0
220
Kanazawa_AI.pdf
hnakagawa
0
220
メルカリ写真検索における Amazon EKS の活用事例と プロダクトにおけるEdgeAI technologyの展望
hnakagawa
5
9.2k
メルカリの写真検索を支えるバックエンド CCSE 2019 version
hnakagawa
0
360
メルカリ写真検索における Amazon EKS の活用事例
hnakagawa
6
29k
メルカリの写真検索を支えるバックエンド
hnakagawa
1
1.2k
Mercari ML Platform
hnakagawa
1
17k
機械学習によるマーケット健全化施策を支える技術
hnakagawa
0
270
メルカリのマーケット健全化施策を支えるML基盤
hnakagawa
10
9.2k
Other Decks in Programming
See All in Programming
10 Tips of AWS ~Gen AI on AWS~
licux
5
510
ハーネスエンジニアリングにどう向き合うか 〜ルールファイルを超えて開発プロセスを設計する〜 / How to approach harness engineering
rkaga
24
16k
ソフトウェア設計の結合バランス #phperkaigi
kajitack
0
160
個人的に嬉しかったpnpmの新機能・3選
matsuo_atsushi
0
110
セグメントとターゲットを意識するプロポーザルの書き方 〜採択の鍵は、誰に刺すかを見極めるマーケティング戦略にある〜
m3m0r7
PRO
0
640
アーキテクチャモダナイゼーションとは何か
nwiizo
19
5.6k
Cache-moi si tu peux : patterns et pièges du cache en production - Devoxx France 2026 - Conférence
slecache
0
320
UIの境界線をデザインする | React Tokyo #15 メイントーク
sasagar
2
400
実用!Hono RPC2026
yodaka
2
280
Explore CoroutineScope
tomoeng11
0
120
The Monolith Strikes Back: Why AI Agents ❤️ Rails Monoliths
serradura
0
370
의존성 주입과 모듈화
fornewid
0
150
Featured
See All Featured
How GitHub (no longer) Works
holman
316
150k
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.4k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
820
エンジニアに許された特別な時間の終わり
watany
106
240k
The Curse of the Amulet
leimatthew05
1
12k
So, you think you're a good person
axbom
PRO
2
2k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
500
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
110
Music & Morning Musume
bryan
47
7.2k
Transcript
ϝϧΧϦͷMLج൫ MLCT vol.5 hnakagawa
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE •
σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT σʔλαΠΤϯςΟετ • https://github.com/hnakagawa
͓ࣄ • ML Platform։ൃ • σʔλαΠΤϯςΟετͱSREͷεΩϧΪϟο ϓΛຒΊΔ • ML Reliability,
SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
ML Platform • ͷML Platform • kubernetesϕʔε • طଘͷML FrameworkΛ༻͠
؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
ϝϧΧϦͷMLར༻ࣄྫ • ײಈग़ • ҧग़ݕ • Ձ֨αδΣετ • ΤΠταδΣετ ʑ…
̍ઍສpredictionΛߦ͍ͬͯΔ
ML Platform Architecture ,VCFSOFUFT $POUSPMMFS $-* $MVTUFS8PSLGMPX %BTICPBSE 4UPSBHF(BUFXBZ .FUSJDT
3VOOFS $PNQPOFOU .FSDBSJ.- $PNQPOFOU &YUFSOBM .JEEMFXBSF
Model Training & Serving Workflow
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC
+PC ɾɾ 3&45 "1* 4USFBNJOH 5'4FSW JOH ɾɾɾ
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1.
GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5'
4FSWJOH 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Container Workflow %BUB4PVSDF *NBHF 5FYUɹ 1SFQSPDFT TJOH *NBHF &TUJNBUPS *NBHF
17 17 1JDUVSF 1SFQSPDFT TJOH *NBHF 17 It’s own implementation
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL
4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM
5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ •
ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
௨ৗͷAPIͱಛੑ͕ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
ϝϞϦফඅ • ҧݕγεςϜͷPython࣮෦࣮ߦ࣌ ʹ2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑ Δ༧ఆ͋Δ • Scikit-learnͰهड़͞Εͨલॲཧ෦͕େ͖͘ ͳΓ͕ͪ
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per
containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Current Issues
ߴͳܧଓతϝϯςφϯε͕ඞཁ • MLػೳσʔλͷ͕มΘͬͨΓɺ༧֎ ͷ͕ൃੜͨ͠Γͯ͠ɺͦΕΒʹରԠ͠ଓ ͚Δඞཁ͕͋Δ MLػೳϦϦʔεޙେ͖ͳ ίετ͕͔͔Γଓ͚Δ
େ෯ͳࣗಈԽ͕ඞਢ
In Progress
ߴͳࣗಈԽ • ࣾͷσʔλ͔ΒFeature Extraction͢Δ࣮ ΛίϯϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ • ϦϦʔεޙͷRe-TrainingɺHyper
parameter optimizationɺDeployΛࣗಈԽ
AutoFlow 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJGJDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT
3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
AutoServing %FQMPZ ϦϦʔεޙͷਫ਼ࢹɾRe-TrainingɾRe-Deploy ΛࣗಈͰߦ͏ .POJUPSJOH &WBMVBUJPO )ZQFS QBSBNFUFS PQUJNJ[BUJPO 3F5SBJOJOH
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!