Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ABEJA Platform での MLOps LINE×ABEJA MLOps Study ...
Search
Yusuke Ueno
April 24, 2019
Technology
0
740
ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA
Yusuke Ueno
April 24, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
[mercari GEARS 2025] Keynote
mercari
PRO
0
210
コード1ミリもわからないけど Claude CodeでFigjamプラグインを作った話
abokadotyann
1
160
us-east-1 の障害が 起きると なぜ ソワソワするのか
miu_crescent
PRO
3
860
手を動かしながら学ぶデータモデリング - 論理設計から物理設計まで / Data modeling
soudai
PRO
24
5.1k
探求の技術
azukiazusa1
7
1.9k
Javaコミュニティの歩き方 ~参加から貢献まで、すべて教えます~
tabatad
0
110
「O(n log(n))のパフォーマンス」の意味がわかるようになろう
dhirabayashi
0
140
Flutterで実装する実践的な攻撃対策とセキュリティ向上
fujikinaga
2
390
仕様は“書く”より“語る” - 分断を超えたチーム開発の実践 / 20251115 Naoki Takahashi
shift_evolve
PRO
1
670
Rubyist入門: The Way to The Timeless Way of Programming
snoozer05
PRO
6
420
【M3】攻めのセキュリティの実践!プロアクティブなセキュリティ対策の実践事例
axelmizu
0
130
AI × クラウドで シイタケの収穫時期を判定してみた
lamaglama39
0
160
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1032
470k
Statistics for Hackers
jakevdp
799
220k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Designing for Performance
lara
610
69k
Making Projects Easy
brettharned
120
6.4k
Balancing Empowerment & Direction
lara
5
740
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.3k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
The Invisible Side of Design
smashingmag
302
51k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
GraphQLとの向き合い方2022年版
quramy
49
14k
For a Future-Friendly Web
brad_frost
180
10k
Transcript
Software Engineer at ABEJA Yusuke Ueno ABEJA Platform Ͱͷ ML
Ops
ࠓ͢͜ͱ • ABEJA Platform ͱʁ • ػցֶशͷ࣮ݧཧʹ͍ͭͯ • ABEJA Platform
Ͱͷ࣮ݧཧͱͦͷ࣮
ABEJA Platform ͱʁ
Copyright © 2019 ABEJA, Inc. All rights reserved.
None
Copyright © 2019 ABEJA, Inc. All rights reserved. نײ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
ͱ? DevOps ͜ͷΑ͏ͳҹ • Development ͱ Operation ؒͷϓϩηεվળ • ΞϓϦέʔγϣϯͷσϦόϦೳྗΛ͋͛ΔจԽతֶɺ ϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
͜͏ఆٛͯ͠Έ·͢ • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࠓֶश෦ʹ͍ͭͯ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶश ΠςϨʔςΟϒͳ࡞ۀ
• ֶशίʔυͷ࡞ɾमਖ਼ • ҟͳΔΦϓςΟϚΠβͰͷࢼߦ • ϋΠύʔύϥϝʔλͷௐ • αϯϓϦϯάํ๏ͷमਖ਼ • ҟͳΔόʔδϣϯͷϥΠϒϥϦͷ༻ • ϥϯμϜγʔυͷมߋ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧͷཧ͕ॏཁ ҰճҰճͷ࣮ݧͷ݅ͱ݁ՌΛه͍ͯ͠ͳ͍ͱɺޙͰਫ਼
͕ྑ͔ͬͨ࣌ͷ࣮ݧΛ࠶ݱͰ͖ͳ͍ هͯ͠ɺӾཡͰ͖ΔΑ͏ʹ͓ͯ͘͠ඞཁ͕͋Δ
• σʔληοτ • ίʔυ • ύϥϝʔλ • ࣮ߦڥ • ࣮ݧ݁ՌʢධՁࢦඪʣ
• ॏΈύϥϝʔλ • ϩά • ࣮ߦ࣌ؒ ه • ࣮ݧ݁Ռͷൺֱ • ৄࡉใͷදࣔ • ࣮ݧ݅ • ࣮ݧ݁Ռ • ՄࢹԽʢը૾ͳͲʣ • ϝϯόʔؒͰͷڞ༗ • աڈͷ࣮ݧͷݕࡧ • Ӿཡ Ӿཡ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ϝϯόʔؒͰͷڞ༗ όʔδϣϯཧ ՄࢹԽ ֶशδϣϒؒͰͷൺֱ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ ̎ͭͷίϯϙʔωϯτΛ༻ҙ
• Datalake • ΦϒδΣΫτετϨʔδ • Datasets • Datalake ΦϒδΣΫτͷࢀরใͱϝλσʔλ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ •
Annotation Tool ʹͯ Datalake ͷσʔλʹରͯ͠Ξϊςʔ γϣϯͨ݁͠ՌΛ Datasets ͱͯ͠ग़ྗ %BUBMBLF %BUBTFUT
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ σʔλΛՃͨ͠߹ɺผͷ
datasets ͱͯ͠࡞Մೳ \^ \^ \^ ɾɾɾ GJMFT BOOPUBUJPOT EBUBTFUT WFSTJPO \^ \^ WFSTJPO
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ tag
Ͱ datasets Λཧతʹׂ͠ಛఆͷཁૉͷΈΛநग़ ɾɾɾ EBUBTFUT UBH" UBH# \^ " \^ " \^ " \^ # \^ #
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷՄࢹԽ σʔληοτࣗମͷ֬ೝ͕Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ߦڥ Platform
Ͱ Python RuntimeɺओཁͳϑϨʔϜϫʔΫɺϥΠ ϒϥϦશ෦ೖΓͷ Docker Image Λఏڙ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशίʔυɾύϥϝʔλ •
ֶशΛ࣮ߦ͢Δ Python ίʔυ • Platform ্Ͱݺͼग़͞ΕΔؔΛ࣮ • Docker Image ʹඞཁͳ Python ϥΠϒϥϦ͕ͳ͍߹ʹ requirements.txt ʹՃ • ༩͑ͨύϥϝʔλڥมͱͯ͠ίʔυͰऔಘՄೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ༻͢Δσʔληοτɺֶशίʔυɺύϥϝʔλɺ࣮ߦ ڥΛ·ͱΊͯɺ࣮ߦͰ͖Δঢ়ଶͰόʔδϣχϯάͯ͠ཧ
ֶशδϣϒఆٛόʔδϣϯ ֶशίʔυ { } ύϥϝʔλ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒఆٛόʔδϣϯͱύϥϝʔλɺΠϯελϯελ ΠϓΛࢦఆֶͯ͠शδϣϒΛ࣮ߦ
ֶशδϣϒ࣮ߦ ֶशίʔυ { } ύϥϝʔλ ֶशδϣϒఆٛόʔδϣϯ { } ্ॻ͖ύϥϝʔλ ֶशδϣϒ σʔληοτ ΠϯελϯελΠϓ ʴ ه ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒͷ࣮ߦͱ݁Ռͷཧ •
kubernetes ( EKS ) Λ༻ • Ҏલ kubernetes on EC2 • nvidia-device-plugin Λ༻ͯ͠ GPU Λೝࣝ • spotinst ͰΫϥελΦʔτεέʔϦϯά • ָʹෳͷΠϯελϯεͰͷεέʔϧ͕Մೳ • p2 ܥɺ p3 ܥΠϯελϯε
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒ •
k8s ͷ Job ͱֶͯ͠शίʔυʹύϥϝʔλΛ༩࣮͑ͯߦ • SDK Λ༻ͯ͠ɺΤϙοΫ͝ͱͷਫ਼Λߋ৽ ΠϯελϯελΠϓ ࣮ߦڥ 4%, ਫ਼Λอଘ ֶशίʔυ { } ύϥϝʔλ ධՁ݁Ռ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ཧܥίϯςφ ֶशδϣϒͱಉ͡ϊʔυʹஔ͠ɺग़ྗͱͳΔͷΛอଘ
&'4Ͱͷڞ༗ϑΝΠϧετϨʔδ ֶशδϣϒ "HFOU 5FOTPS#PBSE 'MVFOUE Ϛϯτ εςʔλεࢹ ग़ྗϑΝΠϧอଘ ެ։ ϩάΛऔಘ อଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
ֶशδϣϒ͕ग़ྗ͢Δඪ४ग़ྗΛอଘ • k8s ͷ DaemonSet ͰίϯςφΛஔ • શͯͷϊʔυʹ̍ͭͷ Fluentd ίϯςφΛ࣮ߦ • جຊతʹ /var/log/containers/*.log Λࢹͯ͠ɺ͜ΕΒ ͷϩάΛ֎෦ͷετϨʔδʹอଘ • Pod ͕ফ͑Δͱϩάফ͑ͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
• RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR ͷઃ ఆ࣍ୈͰɺNoisy Neighbor ʹͳΔ͔ɺResource Limit ʹΑΓ OOM Killer Ͱࡴ͞Εͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. TensorBoard ίϯςφ
ֶशδϣϒ͕ग़ྗ͢ΔΠϕϯτϩάͷՄࢹԽ • Inter-Pod Affinity Λ༻ͯ͠ Job ͱಉ͡ϊʔυʹஔ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺϩάΛಡΈ ࠐΈදࣔ • k8s ͷ Service ͷ Node Port Ͱ internal ʹ expose ͠ɺ ͷ Gateway ͕ೝূ͖Ͱެ։
Copyright © 2019 ABEJA, Inc. All rights reserved. Agent ίϯςφ
ֶशδϣϒͷεςʔλεࢹɾ։࢝ / ऴྃ࣌ࠁΛه • Job ͷεςʔλεΛϙʔϦϯάͯ͠ه • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺֶशδϣϒ ͷऴྃͱͱʹग़ྗϑΝΠϧΛอଘ ֶशδϣϒ "HFOU εςʔλεࢹɾߋ৽ ग़ྗϑΝΠϧอଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved.
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
• ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ ཁٻΛຬͨ͢Ϟσϧ͕Ͱ͖ΔͱଞͷαʔϏε͕ར༻Α͏ʹެ։ • ୭͕ຊ൪͚ͷίʔυΛॻ͔͘ʁ • Data Scientist ͕ॻ͍ͨίʔυΛॻ͖͞ͳ͍ͱ͍͚ͳ͍ • ॻ͖͢ͱਫ਼͕࠶ݱ͠ͳ͍… • Ϟσϧͷߋ৽͕ଟ͗͢ → αʔϏεͷߋ৽ճ૿Ճ • ʑ…
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Development ଆֶश݁ՌͱਪίʔυͱΈ߹Θͤͯ όʔδϣϯཧՄೳ ਪίʔυ ֶश݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ δϣϒ̍ δϣϒ̎ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Ϟσϧͦͷ·· Web API ͱͯ͠ެ։Մೳ Ϟσϧߋ৽࣌ Web API Λ҆શʹߋ৽Մೳ ਪίʔυ Ϟσϧ ॏΈϑΝΠϧ ࣮ߦڥ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ̍ Ϟσϧ̎ ਪίʔυ 8FC"1* 8FC"1* σϓϩΠ ΤϯυϙΠϯτ Γସ͑Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. Platform ͰͷϞσϧཧશମ
{ } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ਪίʔυ ॏΈϑΝΠϧ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ·ͱΊ •
࣮ݧཧ໘͕ͩɺΒͳ͍ͱޙͰࠔΔ • ֶशͷೖྗͱͳΔ༻͢Δσʔληοτɺֶशίʔυɺ ࣮ߦڥͳͲΛ·ͱΊͯόʔδϣϯཧ • ग़ྗ݁Ռͷอଘग़དྷΔ͚ͩ։ൃऀʹෛ୲Λ͔͚ͳ͍ܗ Ͱ Platform ଆͰ࣮ • αʔϏεԽ͢ΔϞσϧͱֶशδϣϒͷ݁Ռͷඥ͚ͯτ ϨʔαϏϦςΟΛ୲อ