Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ABEJA Platform での MLOps LINE×ABEJA MLOps Study ...

ABEJA Platform での MLOps LINE×ABEJA MLOps Study @FUKUOKA

Yusuke Ueno

April 24, 2019
Tweet

Other Decks in Technology

Transcript

  1. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops

    ͱ͸? DevOps ͸͜ͷΑ͏ͳҹ৅ • Development ͱ Operation ؒͷϓϩηεվળ • ΞϓϦέʔγϣϯͷσϦόϦೳྗΛ͋͛ΔจԽత఩ֶɺ ϓϥΫςΟεɺπʔϧ
  2. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops

    ͜͏ఆٛͯ͠Έ·͢ • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼౓Λ΋ͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽత఩ֶɺϓϥΫςΟεɺπʔϧ
  3. Copyright © 2019 ABEJA, Inc. All rights reserved. ֶश ΠςϨʔςΟϒͳ࡞ۀ

    • ֶशίʔυͷ࡞੒ɾमਖ਼ • ҟͳΔΦϓςΟϚΠβͰͷࢼߦ • ϋΠύʔύϥϝʔλͷௐ੔ • αϯϓϦϯάํ๏ͷमਖ਼ • ҟͳΔόʔδϣϯͷϥΠϒϥϦͷ࢖༻ • ϥϯμϜγʔυͷมߋ
  4. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧͷ؅ཧ͕ॏཁ ҰճҰճͷ࣮ݧͷ৚݅ͱ݁ՌΛه࿥͍ͯ͠ͳ͍ͱɺޙͰਫ਼

    ౓͕ྑ͔ͬͨ࣌ͷ࣮ݧΛ࠶ݱͰ͖ͳ͍ ه࿥ͯ͠ɺӾཡͰ͖ΔΑ͏ʹ͓ͯ͘͠ඞཁ͕͋Δ
  5. • σʔληοτ • ίʔυ • ύϥϝʔλ • ࣮ߦ؀ڥ • ࣮ݧ݁ՌʢධՁࢦඪʣ

    • ॏΈύϥϝʔλ • ϩά • ࣮ߦ࣌ؒ ه࿥ • ࣮ݧ݁Ռͷൺֱ • ৄࡉ৘ใͷදࣔ • ࣮ݧ৚݅ • ࣮ݧ݁Ռ • ՄࢹԽʢը૾ͳͲʣ • ϝϯόʔؒͰͷڞ༗ • աڈͷ࣮ݧͷݕࡧ • Ӿཡ Ӿཡ
  6. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ ϝϯόʔؒͰͷڞ༗ όʔδϣϯ؅ཧ ՄࢹԽ ֶशδϣϒؒͰͷൺֱ
  7. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  8. Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯ؅ཧ ̎ͭͷίϯϙʔωϯτΛ༻ҙ

    • Datalake • ΦϒδΣΫτετϨʔδ • Datasets • Datalake ΦϒδΣΫτ΁ͷࢀর৘ใͱϝλσʔλ
  9. Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯ؅ཧ •

    Annotation Tool ʹͯ Datalake ͷσʔλʹରͯ͠Ξϊςʔ γϣϯͨ݁͠ՌΛ Datasets ͱͯ͠ग़ྗ %BUBMBLF %BUBTFUT
  10. Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯ؅ཧ σʔλΛ௥Ճͨ͠৔߹ɺผͷ

    datasets ͱͯ͠࡞੒Մೳ \^ \^ \^ ɾɾɾ GJMFT BOOPUBUJPOT EBUBTFUT WFSTJPO \^ \^ WFSTJPO
  11. Copyright © 2019 ABEJA, Inc. All rights reserved. σʔληοτͷόʔδϣϯ؅ཧ tag

    Ͱ datasets ಺Λ࿦ཧతʹ෼ׂ͠ಛఆͷཁૉͷΈΛநग़ ɾɾɾ EBUBTFUT UBH" UBH# \^ " \^ " \^ " \^ # \^ #
  12. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  13. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ߦ؀ڥ Platform

    Ͱ Python RuntimeɺओཁͳϑϨʔϜϫʔΫɺϥΠ ϒϥϦશ෦ೖΓͷ Docker Image Λఏڙ
  14. Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशίʔυɾύϥϝʔλ •

    ֶशΛ࣮ߦ͢Δ Python ίʔυ • Platform ্Ͱݺͼग़͞ΕΔؔ਺Λ࣮૷ • Docker Image ʹඞཁͳ Python ϥΠϒϥϦ͕ͳ͍৔߹ʹ ͸ requirements.txt ʹ௥Ճ • ༩͑ͨύϥϝʔλ͸؀ڥม਺ͱͯ͠ίʔυ಺ͰऔಘՄೳ
  15. Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒఆٛόʔδϣϯͱύϥϝʔλɺΠϯελϯελ ΠϓΛࢦఆֶͯ͠शδϣϒΛ࣮ߦ

    ֶशδϣϒ࣮ߦ ֶशίʔυ { } ύϥϝʔλ ֶशδϣϒఆٛόʔδϣϯ { } ্ॻ͖ύϥϝʔλ ֶशδϣϒ σʔληοτ ΠϯελϯελΠϓ ʴ ه࿥ ࣮ߦ؀ڥ
  16. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  17. Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒͷ࣮ߦͱ݁Ռͷ؅ཧ •

    kubernetes ( EKS ) Λ࢖༻ • Ҏલ͸ kubernetes on EC2 • nvidia-device-plugin Λ࢖༻ͯ͠ GPU Λೝࣝ • spotinst ͰΫϥελΦʔτεέʔϦϯά • ָʹෳ਺ͷΠϯελϯεͰͷεέʔϧ͕Մೳ • p2 ܥɺ p3 ܥΠϯελϯε
  18. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  19. Copyright © 2019 ABEJA, Inc. All rights reserved. ֶशδϣϒ •

    k8s ͷ Job ͱֶͯ͠शίʔυʹύϥϝʔλΛ༩࣮͑ͯߦ • SDK Λ࢖༻ͯ͠ɺΤϙοΫ͝ͱͷਫ਼౓Λߋ৽ ΠϯελϯελΠϓ ࣮ߦ؀ڥ 4%, ਫ਼౓Λอଘ ֶशίʔυ { } ύϥϝʔλ ධՁ݁Ռ
  20. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  21. Copyright © 2019 ABEJA, Inc. All rights reserved. ؅ཧܥίϯςφ ֶशδϣϒͱಉ͡ϊʔυʹ഑ஔ͠ɺग़ྗͱͳΔ΋ͷΛอଘ

    &'4Ͱͷڞ༗ϑΝΠϧετϨʔδ ֶशδϣϒ "HFOU 5FOTPS#PBSE 'MVFOUE Ϛ΢ϯτ εςʔλε؂ࢹ ग़ྗϑΝΠϧอଘ ެ։ ϩάΛऔಘ อଘ
  22. Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ

    ֶशδϣϒ͕ग़ྗ͢Δඪ४ग़ྗΛอଘ • k8s ͷ DaemonSet ͰίϯςφΛ഑ஔ • શͯͷϊʔυʹ̍ͭͷ Fluentd ίϯςφΛ࣮ߦ • جຊతʹ͸ /var/log/containers/*.log Λ؂ࢹͯ͠ɺ͜ΕΒ ͷϩάΛ֎෦ͷετϨʔδʹอଘ • Pod ͕ফ͑Δͱϩά΋ফ͑ͯ͠·͏
  23. Copyright © 2019 ABEJA, Inc. All rights reserved. Fluentd ίϯςφ

    • RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR ͷઃ ఆ஋࣍ୈͰ͸ɺNoisy Neighbor ʹͳΔ͔ɺResource Limit ʹΑΓ OOM Killer Ͱࡴ͞Εͯ͠·͏
  24. Copyright © 2019 ABEJA, Inc. All rights reserved. TensorBoard ίϯςφ

    ֶशδϣϒ͕ग़ྗ͢ΔΠϕϯτϩάͷՄࢹԽ • Inter-Pod Affinity Λ࢖༻ͯ͠ Job ͱಉ͡ϊʔυʹ഑ஔ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚ΢ϯτ͠ɺϩάΛಡΈ ࠐΈදࣔ • k8s ͷ Service ͷ Node Port Ͱ internal ʹ expose ͠ɺ಺ ੡ͷ Gateway ͕ೝূ෇͖Ͱެ։
  25. Copyright © 2019 ABEJA, Inc. All rights reserved. Agent ίϯςφ

    ֶशδϣϒͷεςʔλε؂ࢹɾ։࢝ / ऴྃ࣌ࠁΛه࿥ • Job ͷεςʔλεΛϙʔϦϯάͯ͠ه࿥ • Job ͱಉ͡ϑΝΠϧγεςϜΛϚ΢ϯτ͠ɺֶशδϣϒ ͷऴྃͱͱ΋ʹग़ྗϑΝΠϧΛอଘ ֶशδϣϒ "HFOU εςʔλε؂ࢹɾߋ৽ ग़ྗϑΝΠϧอଘ
  26. Copyright © 2019 ABEJA, Inc. All rights reserved. ࣮ݧ؅ཧͷશମ૾ {

    } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ
  27. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Ops

    • ML Engineer ͱ Development ؒͷϓϩηεվળ • Ϗδωεʹద༻Ͱ͖Δਫ਼౓Λ΋ͭϞσϧΛఏڙ͢Δೳྗ Λ্͛ΔจԽత఩ֶɺϓϥΫςΟεɺπʔϧ
  28. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer

    ͱ Development ؒͷϓϩηεվળ ཁٻΛຬͨ͢Ϟσϧ͕Ͱ͖ΔͱଞͷαʔϏε͕ར༻Α͏ʹެ։ • ୭͕ຊ൪޲͚ͷίʔυΛॻ͔͘ʁ • Data Scientist ͕ॻ͍ͨίʔυΛॻ͖௚͞ͳ͍ͱ͍͚ͳ͍ • ॻ͖௚͢ͱਫ਼౓͕࠶ݱ͠ͳ͍… • Ϟσϧͷߋ৽͕ଟ͗͢ → αʔϏεͷߋ৽ճ਺૿Ճ • ౳ʑ…
  29. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer

    ͱ Development ؒͷϓϩηεվળ Development ଆ͸ֶश݁Ռͱਪ࿦ίʔυͱ૊Έ߹Θͤͯ όʔδϣϯ؅ཧՄೳ ਪ࿦ίʔυ ֶश݁Ռ ॏΈϑΝΠϧ ࣮ߦ؀ڥ ධՁ݁Ռ ॏΈϑΝΠϧ ࣮ߦ؀ڥ ධՁ݁Ռ δϣϒ̍ δϣϒ̎ ॏΈϑΝΠϧ ࣮ߦ؀ڥ Ϟσϧ
  30. Copyright © 2019 ABEJA, Inc. All rights reserved. ML Engineer

    ͱ Development ؒͷϓϩηεվળ Ϟσϧ͸ͦͷ·· Web API ͱͯ͠ެ։Մೳ Ϟσϧߋ৽࣌΋ Web API Λ҆શʹߋ৽Մೳ ਪ࿦ίʔυ Ϟσϧ ॏΈϑΝΠϧ ࣮ߦ؀ڥ ॏΈϑΝΠϧ ࣮ߦ؀ڥ Ϟσϧ̍ Ϟσϧ̎ ਪ࿦ίʔυ 8FC"1* 8FC"1* σϓϩΠ ΤϯυϙΠϯτ ੾Γସ͑Մೳ
  31. Copyright © 2019 ABEJA, Inc. All rights reserved. Platform ͰͷϞσϧ؅ཧશମ

    { } ֶशίʔυ ύϥϝʔλ ධՁ݁Ռ ॏΈϑΝΠϧ ϩά ࣮ߦ࣌ؒ ֶशδϣϒ σʔληοτ ࣮ߦ؀ڥ ਪ࿦ίʔυ ॏΈϑΝΠϧ ࣮ߦ؀ڥ
  32. Copyright © 2019 ABEJA, Inc. All rights reserved. ·ͱΊ •

    ࣮ݧ؅ཧ͸໘౗͕ͩɺ΍Βͳ͍ͱޙͰࠔΔ • ֶशͷೖྗͱͳΔ࢖༻͢Δσʔληοτɺֶशίʔυɺ ࣮ߦ؀ڥͳͲΛ·ͱΊͯόʔδϣϯ؅ཧ • ग़ྗ݁Ռͷอଘ͸ग़དྷΔ͚ͩ։ൃऀʹෛ୲Λ͔͚ͳ͍ܗ Ͱ Platform ଆͰ࣮૷ • αʔϏεԽ͢ΔϞσϧͱֶशδϣϒͷ݁Ռͷඥ෇͚ͯτ ϨʔαϏϦςΟΛ୲อ