Kanazawa_AI.pdf

 Kanazawa_AI.pdf

Transcript

  1. ϝϧΧϦͷࣸਅݕࡧΛࢧ͑ΔόοΫΤϯυͱ ϓϩμΫτʹ͓͚ΔEdgeAI technologyͷల๬ 1 גࣜձࣾϝϧΧϦ தՏ ޺จ

  2. தՏ ޺จ(Nakagawa Hirofumi) • ෋ࢁݝग़਎ • ϝϧΧϦ΁͸2017೥7݄ೖࣾ ex-MIRACLE LINUX, ex-Cerevo,

    ex-mixi, ex-Drivemode(co-founder) • ॴଐ͸SRE→AI/MLνʔϜ • σόΠευϥΠό։ൃ͔ΒϑϩϯτΤϯυ։ൃ·Ͱ΍ ΔԿͰ΋԰ɻ࠷ۙͰ͸MLͱRobotics͕ओઓ৔ͱͳΓ ͭͭ͋Δɻ Twitter: hnakagawa14 GitHub: hnakagawa 2 ࣗݾ঺հ
  3. Introduction 3

  4. What is ࣸਅݕࡧ • ࣸਅݕࡧͱ͸ɺ͍ΘΏΔը૾ݕࡧػೳ • ΞϓϦ͔ΒࣸਅΛݩʹ঎඼Λݕࡧ͢Δ • ঎඼໊Λ஌Βͳͯ͘΋ը૾͔Β঎඼Λݕࡧ Ͱ͖Δ

    4 ಈըϦϯΫ: https://youtu.be/kTni8EvOCgI
  5. جຊతͳࣸਅݕࡧͷ࢓૊Έ 5 Deep Neural Networks (DNN)Λ࢖༻ͯ͠঎඼ը૾ ͔Βಛ௃ϕΫτϧΛऔಘ औಘͨ͠ಛ௃ϕΫτϧΛ Approximate Nearest

    Neighbor Index(ANN Index) ʹ௥Ճͯ͠ը૾indexΛߏங ݕࡧ࣌ʹ͸ಉ͘͡঎඼ը૾͔Β DNNΛհͯ͠ಛ௃ϕΫτϧΛऔ ಘ͠ɺANN Index͔Βݕࡧ 2 3 1
  6. What is Kubernetes • KubernetesʢҎԼk8sʣͱ͸Φʔϓϯιʔε ͷίϯςφɾΦʔέετϨʔγϣϯγες Ϝ • k8sʹ͸Custom Resource

    Definitionͱݺ͹ ΕΔಠࣗͷϦιʔεΛఆٛͰ͖Δػೳ͕͋ Γɺ։ൃऀ͸ͦͷػೳΛհͯ͠k8sͷػೳΛ ֦ுͰ͖Δ • Amazon Elastic Container Service for Kubernetes (Amazon EKS) ͱ͸k8sͷϚω ʔδυɾαʔϏεɺίϯτϩʔϧϓϨʔϯ ͷ؅ཧΛߦͬͯ͘ΕΔ 6
  7. What is Custom Resource Definition • Custom Resource DefinitionʢҎԼCRDʣͱ ͸ಠࣗʹϦιʔεΛఆٛͰ͖Δk8sͷػೳ

    • CRDɾϦιʔεͱɺΧελϜɾίϯτϩʔ ϥͰߏ੒͞ΕΔ • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε ͷϥΠϑαΠΫϧ/ঢ়ଶʹԠͯ͡Ϋϥελͷ ঢ়ଶΛίϯτϩʔϧ͢Δ 7
  8. ML Platform Lykeion ࣸਅݕࡧ͸Lykeionͱݺ͹ΕΔ಺੡ͷML Platform্ ʹߏங͞Ε͓ͯΓɺԼهͷػೳ͸Platformଆͷػೳ Λ࢖༻͍ͯ͠Δ 8 • Training/Serving

    CRD & ΧελϜίϯτϩʔϥ • ίϯςφϕʔεɾύΠϓϥΠϯ • Training/Serving ίϯςφΠϝʔδɾϏϧμʔ • ϞσϧɾϨϙδτϦ
  9. Architecture 9

  10. Architecture֓ཁਤ 10

  11. 1.TrainingɾϦιʔεͷ࡞੒ 11

  12. TrainingɾϦιʔεͷ࡞੒ • Training custom resourceΛCronJob͕࡞੒ • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε Ͱઃఆ͞ΕͨίϯςφϕʔεɾύΠϓϥΠ ϯΛ࣮ߦ •

    ࣮ߦ͢Δόον୯Ґͱͯ͠͸Hourly, Daily, Monthly͕ଘࡏ 12
  13. ίϯςφϕʔεɾύΠϓϥΠϯ • ֤޻ఔΛݸผͷίϯςφɾΠϝʔδͰ࣮ߦ • ϥΠϒϥϦͷґଘؔ܎ͳͲ؀ڥφΠʔϒͳMLύΠϓϥΠϯͷ໰୊Λղܾ • ύΠϓϥΠϯDAG͸YAMLͰهड़ • ֤޻ఔͷೖग़ྗ͸Persistent VolumeʢҎԼPVʣΛհ͢

    13
  14. Batch Execution as Custom Resource • શͯͷόον࣮ߦ৘ใ͕CRDɾϦιʔεͱͯ͠ k8s্ʹ࢒Δ • ಉ͡ॲཧΛ࠶࣮ߦग़དྷΔͨΊɺόονͷ࠶࣮ߦ

    Λ൐͏ো֐෮چ࡞ۀ͕༰қ 14
  15. 2.ը૾ͷμ΢ϯϩʔυ 15

  16. ը૾ͷμ΢ϯϩʔυ • S3্ʹଘࡏ͢ΔϝϧΧϦɾΠϝʔδετΞ͔Β঎඼ը૾Λμ΢ϯϩʔυ • ύΠϓϥΠϯ্΋ͬͱ΋͕͔͔࣌ؒΔ޻ఔʢը૾਺͕๲େͳͨΊ) • ͦͷͨΊPVʹҰఆظؒΩϟογϡ͢ΔࣄʹΑͬͯ࠶ΠϯσοΫε͕ ඞཁͳ࣌ʹ͸ૉૣ͘ύΠϓϥΠϯΛճͤΔΑ͏ʹ͍ͯ͠Δ 16

  17. 3.ΞηοτͷΞοϓϩʔυ 17

  18. ΞηοτͷΞοϓϩʔυ • ύΠϓϥΠϯͷ੒Ռ෺Ͱ͋Δಛ௃ϕΫτϧͱANN IndexΛϞσϧɾϨϙδτϦʹอଘ • શͯͷ੒Ռ෺͸όʔδϣϯ؅ཧ͞Εͨঢ়ଶͰอଘ͞ΕΔ • ϞσϧɾϨϙδτϦ͸GCS্ʹߏங 18

  19. 4.ServingΠϝʔδͷϏϧυ 19

  20. ServingΠϝʔδͷϏϧυ 1. ϞσϧɾϨϙδτϦΛImage Builderͱݺ͹ΕΔdaemon͕؂ࢹ 2. ৽͍͠Serving͢΂͖Ϧιʔε͕௥Ճ͞ΕΔͱࣗಈͰServingίϯςφɾΠϝʔδΛϏϧυ • ίϯςφɾΠϝʔδ͸શͯͷANN Index౳ͷαʔϏϯάʹඞཁͳϦιʔεΛશؚͯΜͰ͍Δ 3.

    Ϗϧυ͞ΕͨίϯςφɾΠϝʔδΛίϯςφɾϨδετϦʹϓογϡ 20
  21. 5.ServingɾϦιʔεͷ࡞੒ 21

  22. ServingɾϦιʔεͷ࡞੒ • Image Builder͸ίϯςφɾΠϝʔδΛϏ ϧυͨ͋͠ͱɺServingΧελϜɾϦιʔ εΛ࡞੒ • ServingΧελϜɾίϯτϩʔϥ͸CRDɾ ϦιʔεͷઃఆΛݩʹඞཁͳ DeploymentɺService౳Λ࡞੒

    • ຊγεςϜͰ͸ߏங͞ΕͨANN IndexΛ ݸผͷIndexαʔϏεͱͯ͠σϓϩΠ 22
  23. 6.αʔϏεɾσΟεΧόϦ 23

  24. αʔϏεɾσΟεΧόϦ • Ϋϥελ্ʹଘࡏ͢ΔIndexαʔϏεΛ k8sΛհͯࣗ͠ಈతʹऔಘ͢Δ • ͳΔ΂͘େ͖ͳཻ౓ͷIndexΛ࢖༻͢ΔΑ ͏ɺҟͳΔظؒɾཻ౓ͷIndexαʔϏε (Hourly, Daily, Monthly)

    Λࣗಈతʹ૊Έ߹ ΘͤΔ • REST <-> IndexαʔϏεؒͷϓϩτίϧ ͸gRPCΛ࢖༻ 24
  25. ֓ཁਤͷৼΓฦΓ 25

  26. Conclusion 26

  27. ࣸਅݕࡧͷόοΫΤϯυɾΠϯϑϥ 1. ίϯςφɾϕʔεͷ࠶ݱੑͷߴ͍γεςϜ 2. k8sͷCRD/ΧελϜɾίϯτϩʔϥ΍αʔϏεɾσΟεΧόϦ౳ͷػೳΛ׆༻ 3. Batch Execution as Custom

    Resource౳ɺML PlatformͰ࣮ݱ͞Ε͍ͯΔػೳΛ࢖༻ ͠ɺϩόετͳγεςϜΛߏங 4. Ϋϥ΢υɾΠϯϑϥΛk8sͰந৅Խ͢ΔࣄʹΑͬͯɺ֤Ϋϥ΢υɾϕϯμͷྑ͍ͱ͜औΓ Λ͍ͯ͠Δ 27
  28. Next Future 28

  29. Realtime image search • ॴҦ Edge AI TechnologyΛ࢖༻ͯ͠ɺݕࡧʹඞཁͳਪ ࿦ॲཧͷେ෦෼ΛEdgeଆͰߦ͍ͬͯΔ •

    ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ • UX্େ͖ͳϝϦοτ͕༗Δ 29
  30. Listing Dispacher • ద੾ͳग़඼ϝιουΛαδΣετͯ͘͠ΕΔ • ෳࡶͳग़඼ϑϩʔΛ؆ུԽ • ࠷ऴతʹ͸͔͚ͩ͟͢Ͱग़඼͕׬ྃ͢ΔॴΛ໨ࢦ͢!! 30

  31. What must happen to make DNN work on edge •

    ༷ʑͳτϨʔυΦϑ໰୊͕ଘࡏ͢Δ • Accuracy • Latency • Energy consumption • Model size • ໨తͷUXΛୡ੒͢ΔͨΊʹɺΞϧΰϦ ζϜɺΤϯδχΞϦϯά྆ํͰͦΕΒͷ όϥϯεΛߟྀ͢Δඞཁ͕͋Δ 31 Image credit: [1] Image credit: [1] Image credit: [2] ɾΦϖϨʔγϣϯʹΑͬͯίετ͕ҧ͏[1] ɾmobile deviceͰαϙʔτ͍ͯ͠ΔGPUΠϯλʔϑΣʔεͷγΣΞ[2]ɹ
  32. Landscape of execution environment Image credit: [2] Image credit: [2]

    ※ FacebookͷϨϙʔτ͔ΒͷҾ༻[2] ೗Կʹ໨తͷUXΛ࣮ݱͰ͖ΔσόΠεͷΧόϨοδΛ޿͛Δ͔? 32
  33. Designing efficient networks: Manual efforts - Mobile Nets V1, V2

    and V3([4], [5], & [6]) • Depthwise separable conv Λ࢖༻͠ܭࢉྔΛ ௿ݮ • Inverted residual with linear bottleneck Λ࢖༻͠ ϝϞϦΞΫηεྔΛ ௿ݮ Image credit: [4] Image credit: [5] Image credit: [6] • ׆ੑԽؔ਺ʹh-swishΛ࢖༻ • squeeze & excitationΛ࢖ ༻ͨ͠channelͷAttention • ܰྔͳfinal blockͷ࠾༻ etc... 33
  34. Designing efficient networks: Automated ways - ௨ৗͷϞσϧͷτϨʔχϯά ΛτϨʔχϯάɾύϥϝʔλͱͨ͠Ϟσϧ Λೖྗɺ Λग़ྗͱͯ͠ɺ

    Λ࠷খԽ͢Δ ୳ࡧ͢Δ - ΞʔΩςΫνϟɾαʔνͰͷ୳ࡧ&τϨʔχϯά ΞʔΩςΫνϟɾύϥϝʔλ ௥Ճ Λ࠷খԽ͢Δ Λ୳ࡧ͢Δ ͱ 34
  35. Two influential yet costly approaches 35 MnasNet[7] (RL-Based) FBNet[8] (Differentiable)

    • ୳ࡧۭ͔ؒΒ਺ઍͷmodelΛsampling͢Δ • sample͞Εͨchild modelΛεΫϥον͔Β τϨʔχϯά͢Δ • ڊେͳ୳ࡧۭ͔ؒΒ୳ࡧͰ͖Δ͕ɺݱ࣮తͳ ΠςϨʔγϣϯΛߦ͏ҝʹɺڊେͳܭࢉػϦ ιʔε͕ඞཁʹͳΔ • DARTSϕʔεͷ୳ࡧख๏Λ࠾༻͍ͯ͠Δ • ୳ࡧۭؒ಺ͷ֤ΦϖϨʔγϣϯΛGPUϝϞϦʹ৐ ͤΔඞཁ͕͋Δҝɺ݁ہGPUϝϞϦͷ࢖༻ྔ͕໰ ୊ͱͳΓɺsample͞Εͨproxy dataset͕ඞཁʹ ͳͬͨΓɺbatch sizeΛ্͛ΒΕͳ͔ͬͨΓ͢Δ Image credit: [7] Image credit: [8]
  36. Our approach 36 Single-Path NAS[9] Device SoC Generation (Snapdragon) Model

    ImageNet Top-1 Accuracy* Latency (ms)* A 845 SPNAS 74.48 77.90 A 845 MobileNetV2 71.80 76.36 B 808 SPNAS 73.07 113.92 B 808 MobileNetV2 71.80 162.82 C 670 SPNAS 73.15 92.14 C 670 MobileNetV2 71.80 111.85 D 801 SPNAS 71.93 84.65 D 801 MobileNetV2 71.80 120.82 Image credit: [9] * All results are for float32 • superkenelͱ͍͏୳ࡧۭؒͷઃఆ ख๏Ͱɺ਺ඦʙ਺ઍGPU͔͔࣌ؒ Δ୳ࡧ࣌ؒΛ࡟ݮ͢Δ͜ͱ͕Ͱ͖Δ • MobileNet-V2ͱSingle-Path NAS(SPNAS)Ͱੜ੒ͨ͠Ϟσϧͱ ͷੑೳൺֱ MobileNet-V3Λϕʔεʹ୳ࡧۭؒΛઃఆ͠ɺSPNAS౳ͷϦʔζφϒϧͳNASख๏Ͱ୳ࡧ Λߦ͏ͷ͕ݱ࣮త?
  37. Backend System for Edge 37 Training Optimizer Distribute • Architecture

    search • Weight pruning • Dense-Sparse-Dense training • Quantization • K-means cluster • Execution engine selection • Model version management • A/B testing • Distribute model ɾEdgeϓϩμΫτ͸ΫϥΠΞϯταΠυ͚ͩͰ͸੒ཱ͠ͳ͍ ɾRealtime ૒ํ޲ϓϩτίϧ ɾDevice efficient ͳmodelΛੜ੒͢Δ࢓૊Έ ɾmodelΛಈత഑෍͢Δ࢓૊Έ etc..
  38. Edge function architecture 38 38 Server side Client side •

    αʔόଆͷDNN frameworkʹ͸ TensorFlowΛ࢖༻ • LykeionʹΑͬͯύΠϓϥΠϯΛ ߏங • ΫϥΠΞϯτଆͰ͸TensorFlow Lite + MediaPipeΛ࢖༻ • MediaPipeΛ࢖༻͢Δ͜ͱͰલॲ ཧ΍ޙॲཧ΋SIMD౳Λ࢖༻ͯ͠ ޮ཰ԽͰ͖Δ (TF Lite) (Optional)
  39. Conclusion Again 39

  40. Edge AI Technology 40 40 • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ͠UX্େ͖ͳϝϦοτ͕༗Δ • ͔͠͠Model΍Runtimeɺ͞Βʹ͸Backendͱߟྀ͢΂͖ࣄฑ͕ଟ͘ͳΔͷ΋ࣄ࣮ •

    ໨తͷUXΛ࣮ݱ͢ΔͨΊʹɺ Accuracy΍Latency౳ͷόϥϯεΛऔΔඞཁ͕͋Δ • ඞͣ͠΋Accuracy͕࠷༏ઌͰ͸ͳ͍ • ࠓޙҰൠԽ͞ΕΔաఔͰɺҾ͖ଓ͖Runtime΍Modelingख๏ͷٸܹͳਐԽ͕༧૝͞ΕΔ
  41. References 41 [1] Lai, Liangzhen, Naveen Suda, and Vikas Chandra.

    "Not all ops are created equal!." arXiv preprint arXiv:1801.04326 (2018). [2] Wu, Carole-Jean, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood et al. "Machine learning at facebook: Understanding inference at the edge." In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 331-344. IEEE, 2019. [3] Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). [4] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520. 2018. [5] Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019). [6] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016). [7] Tan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. "Mnasnet: Platform- aware neural architecture search for mobile." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828. 2019. [8] Wu, Bichen, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734-10742. 2019. [9] Stamoulis, Dimitrios, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. "Single- path nas: Designing hardware-efficient convnets in less than 4 hours." arXiv preprint arXiv:1904.02877 (2019).
  42. Thank you all for coming today 42