Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ABEJA Platform での MLOps LINE×ABEJA MLOps Study ...
Search
Yusuke Ueno
April 24, 2019
Technology
0
730
ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA
Yusuke Ueno
April 24, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
KubeCon + CloudNativeCon Japan 2025 Recap
ren510dev
1
290
さくらのIaaS基盤のモニタリングとOpenTelemetry/OSC Hokkaido 2025
fujiwara3
2
130
OPENLOGI Company Profile for engineer
hr01
1
33k
asken AI勉強会(Android)
tadashi_sato
0
140
Connect 100+を支える技術
kanyamaguc
0
150
AI導入の理想と現実~コストと浸透〜
oprstchn
0
150
2025-06-26 GitHub CopilotとAI駆動開発:実践と導入のリアル
fl_kawachi
1
220
生成AI時代の開発組織・技術・プロセス 〜 ログラスの挑戦と考察 〜
itohiro73
1
370
Fabric + Databricks 2025.6 の最新情報ピックアップ
ryomaru0825
1
150
Model Mondays S2E03: SLMs & Reasoning
nitya
0
230
BrainPadプログラミングコンテスト記念LT会2025_社内イベント&問題解説
brainpadpr
1
180
「良さそう」と「とても良い」の間には 「良さそうだがホンマか」がたくさんある / 2025.07.01 LLM品質Night
smiyawaki0820
1
430
Featured
See All Featured
Fireside Chat
paigeccino
37
3.5k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
17
950
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Producing Creativity
orderedlist
PRO
346
40k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
5
230
Facilitating Awesome Meetings
lara
54
6.4k
Practical Orchestrator
shlominoach
188
11k
Transcript
Software Engineer at ABEJA Yusuke Ueno ABEJA Platform Ͱͷ ML
Ops
ࠓ͢͜ͱ • ABEJA Platform ͱʁ • ػցֶशͷ࣮ݧཧʹ͍ͭͯ • ABEJA Platform
Ͱͷ࣮ݧཧͱͦͷ࣮
ABEJA Platform ͱʁ
Copyright © 2019 ABEJA, Inc. All rights reserved.
None
Copyright © 2019 ABEJA, Inc. All rights reserved. نײ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
ͱ? DevOps ͜ͷΑ͏ͳҹ • Development ͱ Operation ؒͷϓϩηεվળ • ΞϓϦέʔγϣϯͷσϦόϦೳྗΛ͋͛ΔจԽతֶɺ ϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
͜͏ఆٛͯ͠Έ·͢ • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࠓֶश෦ʹ͍ͭͯ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶश ΠςϨʔςΟϒͳ࡞ۀ
• ֶशίʔυͷ࡞ɾमਖ਼ • ҟͳΔΦϓςΟϚΠβͰͷࢼߦ • ϋΠύʔύϥϝʔλͷௐ • αϯϓϦϯάํ๏ͷमਖ਼ • ҟͳΔόʔδϣϯͷϥΠϒϥϦͷ༻ • ϥϯμϜγʔυͷมߋ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧͷཧ͕ॏཁ ҰճҰճͷ࣮ݧͷ݅ͱ݁ՌΛه͍ͯ͠ͳ͍ͱɺޙͰਫ਼
͕ྑ͔ͬͨ࣌ͷ࣮ݧΛ࠶ݱͰ͖ͳ͍ هͯ͠ɺӾཡͰ͖ΔΑ͏ʹ͓ͯ͘͠ඞཁ͕͋Δ
• σʔληοτ • ίʔυ • ύϥϝʔλ • ࣮ߦڥ • ࣮ݧ݁ՌʢධՁࢦඪʣ
• ॏΈύϥϝʔλ • ϩά • ࣮ߦ࣌ؒ ه • ࣮ݧ݁Ռͷൺֱ • ৄࡉใͷදࣔ • ࣮ݧ݅ • ࣮ݧ݁Ռ • ՄࢹԽʢը૾ͳͲʣ • ϝϯόʔؒͰͷڞ༗ • աڈͷ࣮ݧͷݕࡧ • Ӿཡ Ӿཡ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ϝϯόʔؒͰͷڞ༗ όʔδϣϯཧ ՄࢹԽ ֶशδϣϒؒͰͷൺֱ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ ̎ͭͷίϯϙʔωϯτΛ༻ҙ
• Datalake • ΦϒδΣΫτετϨʔδ • Datasets • Datalake ΦϒδΣΫτͷࢀরใͱϝλσʔλ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ •
Annotation Tool ʹͯ Datalake ͷσʔλʹରͯ͠Ξϊςʔ γϣϯͨ݁͠ՌΛ Datasets ͱͯ͠ग़ྗ %BUBMBLF %BUBTFUT
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ σʔλΛՃͨ͠߹ɺผͷ
datasets ͱͯ͠࡞Մೳ \^ \^ \^ ɾɾɾ GJMFT BOOPUBUJPOT EBUBTFUT WFSTJPO \^ \^ WFSTJPO
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ tag
Ͱ datasets Λཧతʹׂ͠ಛఆͷཁૉͷΈΛநग़ ɾɾɾ EBUBTFUT UBH" UBH# \^ " \^ " \^ " \^ # \^ #
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷՄࢹԽ σʔληοτࣗମͷ֬ೝ͕Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ߦڥ Platform
Ͱ Python RuntimeɺओཁͳϑϨʔϜϫʔΫɺϥΠ ϒϥϦશ෦ೖΓͷ Docker Image Λఏڙ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशίʔυɾύϥϝʔλ •
ֶशΛ࣮ߦ͢Δ Python ίʔυ • Platform ্Ͱݺͼग़͞ΕΔؔΛ࣮ • Docker Image ʹඞཁͳ Python ϥΠϒϥϦ͕ͳ͍߹ʹ requirements.txt ʹՃ • ༩͑ͨύϥϝʔλڥมͱͯ͠ίʔυͰऔಘՄೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ༻͢Δσʔληοτɺֶशίʔυɺύϥϝʔλɺ࣮ߦ ڥΛ·ͱΊͯɺ࣮ߦͰ͖Δঢ়ଶͰόʔδϣχϯάͯ͠ཧ
ֶशδϣϒఆٛόʔδϣϯ ֶशίʔυ { } ύϥϝʔλ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒఆٛόʔδϣϯͱύϥϝʔλɺΠϯελϯελ ΠϓΛࢦఆֶͯ͠शδϣϒΛ࣮ߦ
ֶशδϣϒ࣮ߦ ֶशίʔυ { } ύϥϝʔλ ֶशδϣϒఆٛόʔδϣϯ { } ্ॻ͖ύϥϝʔλ ֶशδϣϒ σʔληοτ ΠϯελϯελΠϓ ʴ ه ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒͷ࣮ߦͱ݁Ռͷཧ •
kubernetes ( EKS ) Λ༻ • Ҏલ kubernetes on EC2 • nvidia-device-plugin Λ༻ͯ͠ GPU Λೝࣝ • spotinst ͰΫϥελΦʔτεέʔϦϯά • ָʹෳͷΠϯελϯεͰͷεέʔϧ͕Մೳ • p2 ܥɺ p3 ܥΠϯελϯε
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒ •
k8s ͷ Job ͱֶͯ͠शίʔυʹύϥϝʔλΛ༩࣮͑ͯߦ • SDK Λ༻ͯ͠ɺΤϙοΫ͝ͱͷਫ਼Λߋ৽ ΠϯελϯελΠϓ ࣮ߦڥ 4%, ਫ਼Λอଘ ֶशίʔυ { } ύϥϝʔλ ධՁ݁Ռ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ཧܥίϯςφ ֶशδϣϒͱಉ͡ϊʔυʹஔ͠ɺग़ྗͱͳΔͷΛอଘ
&'4Ͱͷڞ༗ϑΝΠϧετϨʔδ ֶशδϣϒ "HFOU 5FOTPS#PBSE 'MVFOUE Ϛϯτ εςʔλεࢹ ग़ྗϑΝΠϧอଘ ެ։ ϩάΛऔಘ อଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
ֶशδϣϒ͕ग़ྗ͢Δඪ४ग़ྗΛอଘ • k8s ͷ DaemonSet ͰίϯςφΛஔ • શͯͷϊʔυʹ̍ͭͷ Fluentd ίϯςφΛ࣮ߦ • جຊతʹ /var/log/containers/*.log Λࢹͯ͠ɺ͜ΕΒ ͷϩάΛ֎෦ͷετϨʔδʹอଘ • Pod ͕ফ͑Δͱϩάফ͑ͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
• RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR ͷઃ ఆ࣍ୈͰɺNoisy Neighbor ʹͳΔ͔ɺResource Limit ʹΑΓ OOM Killer Ͱࡴ͞Εͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. TensorBoard ίϯςφ
ֶशδϣϒ͕ग़ྗ͢ΔΠϕϯτϩάͷՄࢹԽ • Inter-Pod Affinity Λ༻ͯ͠ Job ͱಉ͡ϊʔυʹஔ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺϩάΛಡΈ ࠐΈදࣔ • k8s ͷ Service ͷ Node Port Ͱ internal ʹ expose ͠ɺ ͷ Gateway ͕ೝূ͖Ͱެ։
Copyright © 2019 ABEJA, Inc. All rights reserved. Agent ίϯςφ
ֶशδϣϒͷεςʔλεࢹɾ։࢝ / ऴྃ࣌ࠁΛه • Job ͷεςʔλεΛϙʔϦϯάͯ͠ه • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺֶशδϣϒ ͷऴྃͱͱʹग़ྗϑΝΠϧΛอଘ ֶशδϣϒ "HFOU εςʔλεࢹɾߋ৽ ग़ྗϑΝΠϧอଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved.
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
• ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ ཁٻΛຬͨ͢Ϟσϧ͕Ͱ͖ΔͱଞͷαʔϏε͕ར༻Α͏ʹެ։ • ୭͕ຊ൪͚ͷίʔυΛॻ͔͘ʁ • Data Scientist ͕ॻ͍ͨίʔυΛॻ͖͞ͳ͍ͱ͍͚ͳ͍ • ॻ͖͢ͱਫ਼͕࠶ݱ͠ͳ͍… • Ϟσϧͷߋ৽͕ଟ͗͢ → αʔϏεͷߋ৽ճ૿Ճ • ʑ…
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Development ଆֶश݁ՌͱਪίʔυͱΈ߹Θͤͯ όʔδϣϯཧՄೳ ਪίʔυ ֶश݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ δϣϒ̍ δϣϒ̎ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Ϟσϧͦͷ·· Web API ͱͯ͠ެ։Մೳ Ϟσϧߋ৽࣌ Web API Λ҆શʹߋ৽Մೳ ਪίʔυ Ϟσϧ ॏΈϑΝΠϧ ࣮ߦڥ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ̍ Ϟσϧ̎ ਪίʔυ 8FC"1* 8FC"1* σϓϩΠ ΤϯυϙΠϯτ Γସ͑Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. Platform ͰͷϞσϧཧશମ
{ } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ਪίʔυ ॏΈϑΝΠϧ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ·ͱΊ •
࣮ݧཧ໘͕ͩɺΒͳ͍ͱޙͰࠔΔ • ֶशͷೖྗͱͳΔ༻͢Δσʔληοτɺֶशίʔυɺ ࣮ߦڥͳͲΛ·ͱΊͯόʔδϣϯཧ • ग़ྗ݁Ռͷอଘग़དྷΔ͚ͩ։ൃऀʹෛ୲Λ͔͚ͳ͍ܗ Ͱ Platform ଆͰ࣮ • αʔϏεԽ͢ΔϞσϧͱֶशδϣϒͷ݁Ռͷඥ͚ͯτ ϨʔαϏϦςΟΛ୲อ