Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA
Search
Yusuke Ueno
April 24, 2019
Technology
0
620
ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA
Yusuke Ueno
April 24, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
生産性向上チームの紹介
cybozuinsideout
PRO
1
870
ゼロから始めるVue.jsコミュニティ貢献 / first-vuejs-community-contribution-link-and-motivation
lmi
1
120
Terraformあれやこれ/terraform-this-and-that
emiki
8
1.4k
現代CSSフレームワークの内部実装とその仕組み
poteboy
8
3.6k
Hands-on Gemini, the Google DeepMind LLM
meteatamel
1
110
ここが嬉しいABAC ここが辛いよABAC #再解説+補足編
masahirokawahara
1
270
コンパウンドスタートアップのためのスケーラブルでセキュアなInfrastructure as Codeパイプラインを考える / Scalable and Secure Infrastructure as Code Pipeline for a Compound Startup
yuyatakeyama
4
4.7k
SPI原点回帰論:事業課題とFour Keysの結節点を見出す実践的ソフトウェアプロセス改善 / DevOpsDays Tokyo 2024
visional_engineering_and_design
4
1.9k
よく聞くけど使ったことないソフトウェアNo.1 KafkaとSnowflake
foursue
4
350
オーナーシップを持つ領域を明確にする
konifar
13
3.1k
コンテナセキュリティの基本と脅威への対策
kyohmizu
3
750
Python と Snowflake はズッ友だょ!~ Snowflake の Python 関連機能をふりかえる ~
__allllllllez__
1
110
Featured
See All Featured
The Cult of Friendly URLs
andyhume
74
5.7k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
501
140k
How STYLIGHT went responsive
nonsquared
92
4.8k
Designing for humans not robots
tammielis
248
25k
Fontdeck: Realign not Redesign
paulrobertlloyd
76
4.9k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Agile that works and the tools we love
rasmusluckow
325
20k
Art, The Web, and Tiny UX
lynnandtonic
289
19k
The Invisible Customer
myddelton
114
12k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
6
1.5k
Designing with Data
zakiwarfel
96
4.8k
Large-scale JavaScript Application Architecture
addyosmani
504
110k
Transcript
Software Engineer at ABEJA Yusuke Ueno ABEJA Platform Ͱͷ ML
Ops
ࠓ͢͜ͱ • ABEJA Platform ͱʁ • ػցֶशͷ࣮ݧཧʹ͍ͭͯ • ABEJA Platform
Ͱͷ࣮ݧཧͱͦͷ࣮
ABEJA Platform ͱʁ
Copyright © 2019 ABEJA, Inc. All rights reserved.
None
Copyright © 2019 ABEJA, Inc. All rights reserved. نײ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
ͱ? DevOps ͜ͷΑ͏ͳҹ • Development ͱ Operation ؒͷϓϩηεվળ • ΞϓϦέʔγϣϯͷσϦόϦೳྗΛ͋͛ΔจԽతֶɺ ϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
͜͏ఆٛͯ͠Έ·͢ • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࠓֶश෦ʹ͍ͭͯ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶश ΠςϨʔςΟϒͳ࡞ۀ
• ֶशίʔυͷ࡞ɾमਖ਼ • ҟͳΔΦϓςΟϚΠβͰͷࢼߦ • ϋΠύʔύϥϝʔλͷௐ • αϯϓϦϯάํ๏ͷमਖ਼ • ҟͳΔόʔδϣϯͷϥΠϒϥϦͷ༻ • ϥϯμϜγʔυͷมߋ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧͷཧ͕ॏཁ ҰճҰճͷ࣮ݧͷ݅ͱ݁ՌΛه͍ͯ͠ͳ͍ͱɺޙͰਫ਼
͕ྑ͔ͬͨ࣌ͷ࣮ݧΛ࠶ݱͰ͖ͳ͍ هͯ͠ɺӾཡͰ͖ΔΑ͏ʹ͓ͯ͘͠ඞཁ͕͋Δ
• σʔληοτ • ίʔυ • ύϥϝʔλ • ࣮ߦڥ • ࣮ݧ݁ՌʢධՁࢦඪʣ
• ॏΈύϥϝʔλ • ϩά • ࣮ߦ࣌ؒ ه • ࣮ݧ݁Ռͷൺֱ • ৄࡉใͷදࣔ • ࣮ݧ݅ • ࣮ݧ݁Ռ • ՄࢹԽʢը૾ͳͲʣ • ϝϯόʔؒͰͷڞ༗ • աڈͷ࣮ݧͷݕࡧ • Ӿཡ Ӿཡ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ϝϯόʔؒͰͷڞ༗ όʔδϣϯཧ ՄࢹԽ ֶशδϣϒؒͰͷൺֱ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ ̎ͭͷίϯϙʔωϯτΛ༻ҙ
• Datalake • ΦϒδΣΫτετϨʔδ • Datasets • Datalake ΦϒδΣΫτͷࢀরใͱϝλσʔλ
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ •
Annotation Tool ʹͯ Datalake ͷσʔλʹରͯ͠Ξϊςʔ γϣϯͨ݁͠ՌΛ Datasets ͱͯ͠ग़ྗ %BUBMBLF %BUBTFUT
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ σʔλΛՃͨ͠߹ɺผͷ
datasets ͱͯ͠࡞Մೳ \^ \^ \^ ɾɾɾ GJMFT BOOPUBUJPOT EBUBTFUT WFSTJPO \^ \^ WFSTJPO
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ tag
Ͱ datasets Λཧతʹׂ͠ಛఆͷཁૉͷΈΛநग़ ɾɾɾ EBUBTFUT UBH" UBH# \^ " \^ " \^ " \^ # \^ #
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷՄࢹԽ σʔληοτࣗମͷ֬ೝ͕Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ߦڥ Platform
Ͱ Python RuntimeɺओཁͳϑϨʔϜϫʔΫɺϥΠ ϒϥϦશ෦ೖΓͷ Docker Image Λఏڙ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशίʔυɾύϥϝʔλ •
ֶशΛ࣮ߦ͢Δ Python ίʔυ • Platform ্Ͱݺͼग़͞ΕΔؔΛ࣮ • Docker Image ʹඞཁͳ Python ϥΠϒϥϦ͕ͳ͍߹ʹ requirements.txt ʹՃ • ༩͑ͨύϥϝʔλڥมͱͯ͠ίʔυͰऔಘՄೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. ༻͢Δσʔληοτɺֶशίʔυɺύϥϝʔλɺ࣮ߦ ڥΛ·ͱΊͯɺ࣮ߦͰ͖Δঢ়ଶͰόʔδϣχϯάͯ͠ཧ
ֶशδϣϒఆٛόʔδϣϯ ֶशίʔυ { } ύϥϝʔλ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒఆٛόʔδϣϯͱύϥϝʔλɺΠϯελϯελ ΠϓΛࢦఆֶͯ͠शδϣϒΛ࣮ߦ
ֶशδϣϒ࣮ߦ ֶशίʔυ { } ύϥϝʔλ ֶशδϣϒఆٛόʔδϣϯ { } ্ॻ͖ύϥϝʔλ ֶशδϣϒ σʔληοτ ΠϯελϯελΠϓ ʴ ه ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒͷ࣮ߦͱ݁Ռͷཧ •
kubernetes ( EKS ) Λ༻ • Ҏલ kubernetes on EC2 • nvidia-device-plugin Λ༻ͯ͠ GPU Λೝࣝ • spotinst ͰΫϥελΦʔτεέʔϦϯά • ָʹෳͷΠϯελϯεͰͷεέʔϧ͕Մೳ • p2 ܥɺ p3 ܥΠϯελϯε
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒ •
k8s ͷ Job ͱֶͯ͠शίʔυʹύϥϝʔλΛ༩࣮͑ͯߦ • SDK Λ༻ͯ͠ɺΤϙοΫ͝ͱͷਫ਼Λߋ৽ ΠϯελϯελΠϓ ࣮ߦڥ 4%, ਫ਼Λอଘ ֶशίʔυ { } ύϥϝʔλ ධՁ݁Ռ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ཧܥίϯςφ ֶशδϣϒͱಉ͡ϊʔυʹஔ͠ɺग़ྗͱͳΔͷΛอଘ
&'4Ͱͷڞ༗ϑΝΠϧετϨʔδ ֶशδϣϒ "HFOU 5FOTPS#PBSE 'MVFOUE Ϛϯτ εςʔλεࢹ ग़ྗϑΝΠϧอଘ ެ։ ϩάΛऔಘ อଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
ֶशδϣϒ͕ग़ྗ͢Δඪ४ग़ྗΛอଘ • k8s ͷ DaemonSet ͰίϯςφΛஔ • શͯͷϊʔυʹ̍ͭͷ Fluentd ίϯςφΛ࣮ߦ • جຊతʹ /var/log/containers/*.log Λࢹͯ͠ɺ͜ΕΒ ͷϩάΛ֎෦ͷετϨʔδʹอଘ • Pod ͕ফ͑Δͱϩάফ͑ͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ
• RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR ͷઃ ఆ࣍ୈͰɺNoisy Neighbor ʹͳΔ͔ɺResource Limit ʹΑΓ OOM Killer Ͱࡴ͞Εͯ͠·͏
Copyright © 2019 ABEJA, Inc. All rights reserved. TensorBoard ίϯςφ
ֶशδϣϒ͕ग़ྗ͢ΔΠϕϯτϩάͷՄࢹԽ • Inter-Pod Affinity Λ༻ͯ͠ Job ͱಉ͡ϊʔυʹஔ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺϩάΛಡΈ ࠐΈදࣔ • k8s ͷ Service ͷ Node Port Ͱ internal ʹ expose ͠ɺ ͷ Gateway ͕ೝূ͖Ͱެ։
Copyright © 2019 ABEJA, Inc. All rights reserved. Agent ίϯςφ
ֶशδϣϒͷεςʔλεࢹɾ։࢝ / ऴྃ࣌ࠁΛه • Job ͷεςʔλεΛϙʔϦϯάͯ͠ه • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺֶशδϣϒ ͷऴྃͱͱʹग़ྗϑΝΠϧΛอଘ ֶशδϣϒ "HFOU εςʔλεࢹɾߋ৽ ग़ྗϑΝΠϧอଘ
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ {
} ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved.
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops
• ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ ཁٻΛຬͨ͢Ϟσϧ͕Ͱ͖ΔͱଞͷαʔϏε͕ར༻Α͏ʹެ։ • ୭͕ຊ൪͚ͷίʔυΛॻ͔͘ʁ • Data Scientist ͕ॻ͍ͨίʔυΛॻ͖͞ͳ͍ͱ͍͚ͳ͍ • ॻ͖͢ͱਫ਼͕࠶ݱ͠ͳ͍… • Ϟσϧͷߋ৽͕ଟ͗͢ → αʔϏεͷߋ৽ճ૿Ճ • ʑ…
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Development ଆֶश݁ՌͱਪίʔυͱΈ߹Θͤͯ όʔδϣϯཧՄೳ ਪίʔυ ֶश݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ δϣϒ̍ δϣϒ̎ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer
ͱ Development ؒͷϓϩηεվળ Ϟσϧͦͷ·· Web API ͱͯ͠ެ։Մೳ Ϟσϧߋ৽࣌ Web API Λ҆શʹߋ৽Մೳ ਪίʔυ Ϟσϧ ॏΈϑΝΠϧ ࣮ߦڥ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ̍ Ϟσϧ̎ ਪίʔυ 8FC"1* 8FC"1* σϓϩΠ ΤϯυϙΠϯτ Γସ͑Մೳ
Copyright © 2019 ABEJA, Inc. All rights reserved. Platform ͰͷϞσϧཧશମ
{ } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ਪίʔυ ॏΈϑΝΠϧ ࣮ߦڥ
Copyright © 2019 ABEJA, Inc. All rights reserved. ·ͱΊ •
࣮ݧཧ໘͕ͩɺΒͳ͍ͱޙͰࠔΔ • ֶशͷೖྗͱͳΔ༻͢Δσʔληοτɺֶशίʔυɺ ࣮ߦڥͳͲΛ·ͱΊͯόʔδϣϯཧ • ग़ྗ݁Ռͷอଘग़དྷΔ͚ͩ։ൃऀʹෛ୲Λ͔͚ͳ͍ܗ Ͱ Platform ଆͰ࣮ • αʔϏεԽ͢ΔϞσϧͱֶशδϣϒͷ݁Ռͷඥ͚ͯτ ϨʔαϏϦςΟΛ୲อ