ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA
by
Yusuke Ueno
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Software Engineer at ABEJA Yusuke Ueno ABEJA Platform Ͱͷ ML Ops
Slide 2
Slide 2 text
ࠓ͢͜ͱ • ABEJA Platform ͱʁ • ػցֶशͷ࣮ݧཧʹ͍ͭͯ • ABEJA Platform Ͱͷ࣮ݧཧͱͦͷ࣮
Slide 3
Slide 3 text
ABEJA Platform ͱʁ
Slide 4
Slide 4 text
Copyright © 2019 ABEJA, Inc. All rights reserved.
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Copyright © 2019 ABEJA, Inc. All rights reserved. نײ
Slide 7
Slide 7 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops ͱ? DevOps ͜ͷΑ͏ͳҹ • Development ͱ Operation ؒͷϓϩηεվળ • ΞϓϦέʔγϣϯͷσϦόϦೳྗΛ͋͛ΔจԽతֶɺ ϓϥΫςΟεɺπʔϧ
Slide 8
Slide 8 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops ͜͏ఆٛͯ͠Έ·͢ • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Slide 9
Slide 9 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࠓֶश෦ʹ͍ͭͯ
Slide 10
Slide 10 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶश ΠςϨʔςΟϒͳ࡞ۀ • ֶशίʔυͷ࡞ɾमਖ਼ • ҟͳΔΦϓςΟϚΠβͰͷࢼߦ • ϋΠύʔύϥϝʔλͷௐ • αϯϓϦϯάํ๏ͷमਖ਼ • ҟͳΔόʔδϣϯͷϥΠϒϥϦͷ༻ • ϥϯμϜγʔυͷมߋ
Slide 11
Slide 11 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧͷཧ͕ॏཁ ҰճҰճͷ࣮ݧͷ݅ͱ݁ՌΛه͍ͯ͠ͳ͍ͱɺޙͰਫ਼ ͕ྑ͔ͬͨ࣌ͷ࣮ݧΛ࠶ݱͰ͖ͳ͍ هͯ͠ɺӾཡͰ͖ΔΑ͏ʹ͓ͯ͘͠ඞཁ͕͋Δ
Slide 12
Slide 12 text
• σʔληοτ • ίʔυ • ύϥϝʔλ • ࣮ߦڥ • ࣮ݧ݁ՌʢධՁࢦඪʣ • ॏΈύϥϝʔλ • ϩά • ࣮ߦ࣌ؒ ه • ࣮ݧ݁Ռͷൺֱ • ৄࡉใͷදࣔ • ࣮ݧ݅ • ࣮ݧ݁Ռ • ՄࢹԽʢը૾ͳͲʣ • ϝϯόʔؒͰͷڞ༗ • աڈͷ࣮ݧͷݕࡧ • Ӿཡ Ӿཡ
Slide 13
Slide 13 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ϝϯόʔؒͰͷڞ༗ όʔδϣϯཧ ՄࢹԽ ֶशδϣϒؒͰͷൺֱ
Slide 14
Slide 14 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 15
Slide 15 text
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ ̎ͭͷίϯϙʔωϯτΛ༻ҙ • Datalake • ΦϒδΣΫτετϨʔδ • Datasets • Datalake ΦϒδΣΫτͷࢀরใͱϝλσʔλ
Slide 16
Slide 16 text
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ • Annotation Tool ʹͯ Datalake ͷσʔλʹରͯ͠Ξϊςʔ γϣϯͨ݁͠ՌΛ Datasets ͱͯ͠ग़ྗ %BUBMBLF %BUBTFUT
Slide 17
Slide 17 text
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ σʔλΛՃͨ͠߹ɺผͷ datasets ͱͯ͠࡞Մೳ \^ \^ \^ ɾɾɾ GJMFT BOOPUBUJPOT EBUBTFUT WFSTJPO \^ \^ WFSTJPO
Slide 18
Slide 18 text
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯཧ tag Ͱ datasets Λཧతʹׂ͠ಛఆͷཁૉͷΈΛநग़ ɾɾɾ EBUBTFUT UBH" UBH# \^ " \^ " \^ " \^ # \^ #
Slide 19
Slide 19 text
Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷՄࢹԽ σʔληοτࣗମͷ֬ೝ͕Մೳ
Slide 20
Slide 20 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 21
Slide 21 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ߦڥ Platform Ͱ Python RuntimeɺओཁͳϑϨʔϜϫʔΫɺϥΠ ϒϥϦશ෦ೖΓͷ Docker Image Λఏڙ
Slide 22
Slide 22 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशίʔυɾύϥϝʔλ • ֶशΛ࣮ߦ͢Δ Python ίʔυ • Platform ্Ͱݺͼग़͞ΕΔؔΛ࣮ • Docker Image ʹඞཁͳ Python ϥΠϒϥϦ͕ͳ͍߹ʹ requirements.txt ʹՃ • ༩͑ͨύϥϝʔλڥมͱͯ͠ίʔυͰऔಘՄೳ
Slide 23
Slide 23 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ༻͢Δσʔληοτɺֶशίʔυɺύϥϝʔλɺ࣮ߦ ڥΛ·ͱΊͯɺ࣮ߦͰ͖Δঢ়ଶͰόʔδϣχϯάͯ͠ཧ ֶशδϣϒఆٛόʔδϣϯ ֶशίʔυ { } ύϥϝʔλ σʔληοτ ࣮ߦڥ
Slide 24
Slide 24 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒఆٛόʔδϣϯͱύϥϝʔλɺΠϯελϯελ ΠϓΛࢦఆֶͯ͠शδϣϒΛ࣮ߦ ֶशδϣϒ࣮ߦ ֶशίʔυ { } ύϥϝʔλ ֶशδϣϒఆٛόʔδϣϯ { } ্ॻ͖ύϥϝʔλ ֶशδϣϒ σʔληοτ ΠϯελϯελΠϓ ʴ ه ࣮ߦڥ
Slide 25
Slide 25 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 26
Slide 26 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒͷ࣮ߦͱ݁Ռͷཧ • kubernetes ( EKS ) Λ༻ • Ҏલ kubernetes on EC2 • nvidia-device-plugin Λ༻ͯ͠ GPU Λೝࣝ • spotinst ͰΫϥελΦʔτεέʔϦϯά • ָʹෳͷΠϯελϯεͰͷεέʔϧ͕Մೳ • p2 ܥɺ p3 ܥΠϯελϯε
Slide 27
Slide 27 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 28
Slide 28 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒ • k8s ͷ Job ͱֶͯ͠शίʔυʹύϥϝʔλΛ༩࣮͑ͯߦ • SDK Λ༻ͯ͠ɺΤϙοΫ͝ͱͷਫ਼Λߋ৽ ΠϯελϯελΠϓ ࣮ߦڥ 4%, ਫ਼Λอଘ ֶशίʔυ { } ύϥϝʔλ ධՁ݁Ռ
Slide 29
Slide 29 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 30
Slide 30 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ཧܥίϯςφ ֶशδϣϒͱಉ͡ϊʔυʹஔ͠ɺग़ྗͱͳΔͷΛอଘ &'4Ͱͷڞ༗ϑΝΠϧετϨʔδ ֶशδϣϒ "HFOU 5FOTPS#PBSE 'MVFOUE Ϛϯτ εςʔλεࢹ ग़ྗϑΝΠϧอଘ ެ։ ϩάΛऔಘ อଘ
Slide 31
Slide 31 text
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ ֶशδϣϒ͕ग़ྗ͢Δඪ४ग़ྗΛอଘ • k8s ͷ DaemonSet ͰίϯςφΛஔ • શͯͷϊʔυʹ̍ͭͷ Fluentd ίϯςφΛ࣮ߦ • جຊతʹ /var/log/containers/*.log Λࢹͯ͠ɺ͜ΕΒ ͷϩάΛ֎෦ͷετϨʔδʹอଘ • Pod ͕ফ͑Δͱϩάফ͑ͯ͠·͏
Slide 32
Slide 32 text
Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ • RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR ͷઃ ఆ࣍ୈͰɺNoisy Neighbor ʹͳΔ͔ɺResource Limit ʹΑΓ OOM Killer Ͱࡴ͞Εͯ͠·͏
Slide 33
Slide 33 text
Copyright © 2019 ABEJA, Inc. All rights reserved. TensorBoard ίϯςφ ֶशδϣϒ͕ग़ྗ͢ΔΠϕϯτϩάͷՄࢹԽ • Inter-Pod Affinity Λ༻ͯ͠ Job ͱಉ͡ϊʔυʹஔ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺϩάΛಡΈ ࠐΈදࣔ • k8s ͷ Service ͷ Node Port Ͱ internal ʹ expose ͠ɺ ͷ Gateway ͕ೝূ͖Ͱެ։
Slide 34
Slide 34 text
Copyright © 2019 ABEJA, Inc. All rights reserved. Agent ίϯςφ ֶशδϣϒͷεςʔλεࢹɾ։࢝ / ऴྃ࣌ࠁΛه • Job ͷεςʔλεΛϙʔϦϯάͯ͠ه • Job ͱಉ͡ϑΝΠϧγεςϜΛϚϯτ͠ɺֶशδϣϒ ͷऴྃͱͱʹग़ྗϑΝΠϧΛอଘ ֶशδϣϒ "HFOU εςʔλεࢹɾߋ৽ ग़ྗϑΝΠϧอଘ
Slide 35
Slide 35 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧཧͷશମ૾ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ
Slide 36
Slide 36 text
Copyright © 2019 ABEJA, Inc. All rights reserved.
Slide 37
Slide 37 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼ΛͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽతֶɺϓϥΫςΟεɺπʔϧ
Slide 38
Slide 38 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer ͱ Development ؒͷϓϩηεվળ ཁٻΛຬͨ͢Ϟσϧ͕Ͱ͖ΔͱଞͷαʔϏε͕ར༻Α͏ʹެ։ • ୭͕ຊ൪͚ͷίʔυΛॻ͔͘ʁ • Data Scientist ͕ॻ͍ͨίʔυΛॻ͖͞ͳ͍ͱ͍͚ͳ͍ • ॻ͖͢ͱਫ਼͕࠶ݱ͠ͳ͍… • Ϟσϧͷߋ৽͕ଟ͗͢ → αʔϏεͷߋ৽ճ૿Ճ • ʑ…
Slide 39
Slide 39 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer ͱ Development ؒͷϓϩηεվળ Development ଆֶश݁ՌͱਪίʔυͱΈ߹Θͤͯ όʔδϣϯཧՄೳ ਪίʔυ ֶश݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ ॏΈϑΝΠϧ ࣮ߦڥ ධՁ݁Ռ δϣϒ̍ δϣϒ̎ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ
Slide 40
Slide 40 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer ͱ Development ؒͷϓϩηεվળ Ϟσϧͦͷ·· Web API ͱͯ͠ެ։Մೳ Ϟσϧߋ৽࣌ Web API Λ҆શʹߋ৽Մೳ ਪίʔυ Ϟσϧ ॏΈϑΝΠϧ ࣮ߦڥ ॏΈϑΝΠϧ ࣮ߦڥ Ϟσϧ̍ Ϟσϧ̎ ਪίʔυ 8FC"1* 8FC"1* σϓϩΠ ΤϯυϙΠϯτ Γସ͑Մೳ
Slide 41
Slide 41 text
Copyright © 2019 ABEJA, Inc. All rights reserved. Platform ͰͷϞσϧཧશମ { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦڥ ਪίʔυ ॏΈϑΝΠϧ ࣮ߦڥ
Slide 42
Slide 42 text
Copyright © 2019 ABEJA, Inc. All rights reserved. ·ͱΊ • ࣮ݧཧ໘͕ͩɺΒͳ͍ͱޙͰࠔΔ • ֶशͷೖྗͱͳΔ༻͢Δσʔληοτɺֶशίʔυɺ ࣮ߦڥͳͲΛ·ͱΊͯόʔδϣϯཧ • ग़ྗ݁Ռͷอଘग़དྷΔ͚ͩ։ൃऀʹෛ୲Λ͔͚ͳ͍ܗ Ͱ Platform ଆͰ࣮ • αʔϏεԽ͢ΔϞσϧͱֶशδϣϒͷ݁Ռͷඥ͚ͯτ ϨʔαϏϦςΟΛ୲อ