Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
mlct.pdf
Search
Hirofumi Nakagawa/中河 宏文
July 23, 2018
Programming
2
2.1k
mlct.pdf
Hirofumi Nakagawa/中河 宏文
July 23, 2018
Tweet
Share
More Decks by Hirofumi Nakagawa/中河 宏文
See All by Hirofumi Nakagawa/中河 宏文
IoTデバイスでMLモデルを動かす技術
hnakagawa
0
210
Kanazawa_AI.pdf
hnakagawa
0
200
メルカリ写真検索における Amazon EKS の活用事例と プロダクトにおけるEdgeAI technologyの展望
hnakagawa
5
9.1k
メルカリの写真検索を支えるバックエンド CCSE 2019 version
hnakagawa
0
350
メルカリ写真検索における Amazon EKS の活用事例
hnakagawa
6
29k
メルカリの写真検索を支えるバックエンド
hnakagawa
1
1.2k
Mercari ML Platform
hnakagawa
1
17k
機械学習によるマーケット健全化施策を支える技術
hnakagawa
0
270
メルカリのマーケット健全化施策を支えるML基盤
hnakagawa
10
9.1k
Other Decks in Programming
See All in Programming
なぜSQLはAIぽく見えるのか/why does SQL look AI like
florets1
0
450
AI によるインシデント初動調査の自動化を行う AI インシデントコマンダーを作った話
azukiazusa1
1
700
Automatic Grammar Agreementと Markdown Extended Attributes について
kishikawakatsumi
0
180
LLM Observabilityによる 対話型音声AIアプリケーションの安定運用
gekko0114
2
420
AI巻き込み型コードレビューのススメ
nealle
1
150
CSC307 Lecture 09
javiergs
PRO
1
830
今から始めるClaude Code超入門
448jp
8
8.6k
20260127_試行錯誤の結晶を1冊に。著者が解説 先輩データサイエンティストからの指南書 / author's_commentary_ds_instructions_guide
nash_efp
1
940
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
Amazon Bedrockを活用したRAGの品質管理パイプライン構築
tosuri13
4
260
AI時代のキャリアプラン「技術の引力」からの脱出と「問い」へのいざない / tech-gravity
minodriven
20
7k
Fluid Templating in TYPO3 14
s2b
0
130
Featured
See All Featured
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
200
Darren the Foodie - Storyboard
khoart
PRO
2
2.4k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
280
How to Talk to Developers About Accessibility
jct
2
130
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Designing for Timeless Needs
cassininazir
0
130
The SEO identity crisis: Don't let AI make you average
varn
0
67
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
55
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
Building Applications with DynamoDB
mza
96
6.9k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
110
Transcript
ϝϧΧϦͷMLج൫ MLCT vol.5 hnakagawa
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE •
σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT σʔλαΠΤϯςΟετ • https://github.com/hnakagawa
͓ࣄ • ML Platform։ൃ • σʔλαΠΤϯςΟετͱSREͷεΩϧΪϟο ϓΛຒΊΔ • ML Reliability,
SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
ML Platform • ͷML Platform • kubernetesϕʔε • طଘͷML FrameworkΛ༻͠
؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
ϝϧΧϦͷMLར༻ࣄྫ • ײಈग़ • ҧग़ݕ • Ձ֨αδΣετ • ΤΠταδΣετ ʑ…
̍ઍສpredictionΛߦ͍ͬͯΔ
ML Platform Architecture ,VCFSOFUFT $POUSPMMFS $-* $MVTUFS8PSLGMPX %BTICPBSE 4UPSBHF(BUFXBZ .FUSJDT
3VOOFS $PNQPOFOU .FSDBSJ.- $PNQPOFOU &YUFSOBM .JEEMFXBSF
Model Training & Serving Workflow
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC
+PC ɾɾ 3&45 "1* 4USFBNJOH 5'4FSW JOH ɾɾɾ
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1.
GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5'
4FSWJOH 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Container Workflow %BUB4PVSDF *NBHF 5FYUɹ 1SFQSPDFT TJOH *NBHF &TUJNBUPS *NBHF
17 17 1JDUVSF 1SFQSPDFT TJOH *NBHF 17 It’s own implementation
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL
4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM
5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ •
ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
௨ৗͷAPIͱಛੑ͕ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
ϝϞϦফඅ • ҧݕγεςϜͷPython࣮෦࣮ߦ࣌ ʹ2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑ Δ༧ఆ͋Δ • Scikit-learnͰهड़͞Εͨલॲཧ෦͕େ͖͘ ͳΓ͕ͪ
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per
containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Current Issues
ߴͳܧଓతϝϯςφϯε͕ඞཁ • MLػೳσʔλͷ͕มΘͬͨΓɺ༧֎ ͷ͕ൃੜͨ͠Γͯ͠ɺͦΕΒʹରԠ͠ଓ ͚Δඞཁ͕͋Δ MLػೳϦϦʔεޙେ͖ͳ ίετ͕͔͔Γଓ͚Δ
େ෯ͳࣗಈԽ͕ඞਢ
In Progress
ߴͳࣗಈԽ • ࣾͷσʔλ͔ΒFeature Extraction͢Δ࣮ ΛίϯϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ • ϦϦʔεޙͷRe-TrainingɺHyper
parameter optimizationɺDeployΛࣗಈԽ
AutoFlow 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJGJDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT
3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
AutoServing %FQMPZ ϦϦʔεޙͷਫ਼ࢹɾRe-TrainingɾRe-Deploy ΛࣗಈͰߦ͏ .POJUPSJOH &WBMVBUJPO )ZQFS QBSBNFUFS PQUJNJ[BUJPO 3F5SBJOJOH
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!