Kanazawa_AI.pdf

 Kanazawa_AI.pdf

Transcript

  1. 2.

    தՏ ޺จ(Nakagawa Hirofumi) • ෋ࢁݝग़਎ • ϝϧΧϦ΁͸2017೥7݄ೖࣾ ex-MIRACLE LINUX, ex-Cerevo,

    ex-mixi, ex-Drivemode(co-founder) • ॴଐ͸SRE→AI/MLνʔϜ • σόΠευϥΠό։ൃ͔ΒϑϩϯτΤϯυ։ൃ·Ͱ΍ ΔԿͰ΋԰ɻ࠷ۙͰ͸MLͱRobotics͕ओઓ৔ͱͳΓ ͭͭ͋Δɻ Twitter: hnakagawa14 GitHub: hnakagawa 2 ࣗݾ঺հ
  2. 5.

    جຊతͳࣸਅݕࡧͷ࢓૊Έ 5 Deep Neural Networks (DNN)Λ࢖༻ͯ͠঎඼ը૾ ͔Βಛ௃ϕΫτϧΛऔಘ औಘͨ͠ಛ௃ϕΫτϧΛ Approximate Nearest

    Neighbor Index(ANN Index) ʹ௥Ճͯ͠ը૾indexΛߏங ݕࡧ࣌ʹ͸ಉ͘͡঎඼ը૾͔Β DNNΛհͯ͠ಛ௃ϕΫτϧΛऔ ಘ͠ɺANN Index͔Βݕࡧ 2 3 1
  3. 6.

    What is Kubernetes • KubernetesʢҎԼk8sʣͱ͸Φʔϓϯιʔε ͷίϯςφɾΦʔέετϨʔγϣϯγες Ϝ • k8sʹ͸Custom Resource

    Definitionͱݺ͹ ΕΔಠࣗͷϦιʔεΛఆٛͰ͖Δػೳ͕͋ Γɺ։ൃऀ͸ͦͷػೳΛհͯ͠k8sͷػೳΛ ֦ுͰ͖Δ • Amazon Elastic Container Service for Kubernetes (Amazon EKS) ͱ͸k8sͷϚω ʔδυɾαʔϏεɺίϯτϩʔϧϓϨʔϯ ͷ؅ཧΛߦͬͯ͘ΕΔ 6
  4. 7.

    What is Custom Resource Definition • Custom Resource DefinitionʢҎԼCRDʣͱ ͸ಠࣗʹϦιʔεΛఆٛͰ͖Δk8sͷػೳ

    • CRDɾϦιʔεͱɺΧελϜɾίϯτϩʔ ϥͰߏ੒͞ΕΔ • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε ͷϥΠϑαΠΫϧ/ঢ়ଶʹԠͯ͡Ϋϥελͷ ঢ়ଶΛίϯτϩʔϧ͢Δ 7
  5. 8.

    ML Platform Lykeion ࣸਅݕࡧ͸Lykeionͱݺ͹ΕΔ಺੡ͷML Platform্ ʹߏங͞Ε͓ͯΓɺԼهͷػೳ͸Platformଆͷػೳ Λ࢖༻͍ͯ͠Δ 8 • Training/Serving

    CRD & ΧελϜίϯτϩʔϥ • ίϯςφϕʔεɾύΠϓϥΠϯ • Training/Serving ίϯςφΠϝʔδɾϏϧμʔ • ϞσϧɾϨϙδτϦ
  6. 27.

    ࣸਅݕࡧͷόοΫΤϯυɾΠϯϑϥ 1. ίϯςφɾϕʔεͷ࠶ݱੑͷߴ͍γεςϜ 2. k8sͷCRD/ΧελϜɾίϯτϩʔϥ΍αʔϏεɾσΟεΧόϦ౳ͷػೳΛ׆༻ 3. Batch Execution as Custom

    Resource౳ɺML PlatformͰ࣮ݱ͞Ε͍ͯΔػೳΛ࢖༻ ͠ɺϩόετͳγεςϜΛߏங 4. Ϋϥ΢υɾΠϯϑϥΛk8sͰந৅Խ͢ΔࣄʹΑͬͯɺ֤Ϋϥ΢υɾϕϯμͷྑ͍ͱ͜औΓ Λ͍ͯ͠Δ 27
  7. 31.

    What must happen to make DNN work on edge •

    ༷ʑͳτϨʔυΦϑ໰୊͕ଘࡏ͢Δ • Accuracy • Latency • Energy consumption • Model size • ໨తͷUXΛୡ੒͢ΔͨΊʹɺΞϧΰϦ ζϜɺΤϯδχΞϦϯά྆ํͰͦΕΒͷ όϥϯεΛߟྀ͢Δඞཁ͕͋Δ 31 Image credit: [1] Image credit: [1] Image credit: [2] ɾΦϖϨʔγϣϯʹΑͬͯίετ͕ҧ͏[1] ɾmobile deviceͰαϙʔτ͍ͯ͠ΔGPUΠϯλʔϑΣʔεͷγΣΞ[2]ɹ
  8. 32.

    Landscape of execution environment Image credit: [2] Image credit: [2]

    ※ FacebookͷϨϙʔτ͔ΒͷҾ༻[2] ೗Կʹ໨తͷUXΛ࣮ݱͰ͖ΔσόΠεͷΧόϨοδΛ޿͛Δ͔? 32
  9. 33.

    Designing efficient networks: Manual efforts - Mobile Nets V1, V2

    and V3([4], [5], & [6]) • Depthwise separable conv Λ࢖༻͠ܭࢉྔΛ ௿ݮ • Inverted residual with linear bottleneck Λ࢖༻͠ ϝϞϦΞΫηεྔΛ ௿ݮ Image credit: [4] Image credit: [5] Image credit: [6] • ׆ੑԽؔ਺ʹh-swishΛ࢖༻ • squeeze & excitationΛ࢖ ༻ͨ͠channelͷAttention • ܰྔͳfinal blockͷ࠾༻ etc... 33
  10. 34.

    Designing efficient networks: Automated ways - ௨ৗͷϞσϧͷτϨʔχϯά ΛτϨʔχϯάɾύϥϝʔλͱͨ͠Ϟσϧ Λೖྗɺ Λग़ྗͱͯ͠ɺ

    Λ࠷খԽ͢Δ ୳ࡧ͢Δ - ΞʔΩςΫνϟɾαʔνͰͷ୳ࡧ&τϨʔχϯά ΞʔΩςΫνϟɾύϥϝʔλ ௥Ճ Λ࠷খԽ͢Δ Λ୳ࡧ͢Δ ͱ 34
  11. 35.

    Two influential yet costly approaches 35 MnasNet[7] (RL-Based) FBNet[8] (Differentiable)

    • ୳ࡧۭ͔ؒΒ਺ઍͷmodelΛsampling͢Δ • sample͞Εͨchild modelΛεΫϥον͔Β τϨʔχϯά͢Δ • ڊେͳ୳ࡧۭ͔ؒΒ୳ࡧͰ͖Δ͕ɺݱ࣮తͳ ΠςϨʔγϣϯΛߦ͏ҝʹɺڊେͳܭࢉػϦ ιʔε͕ඞཁʹͳΔ • DARTSϕʔεͷ୳ࡧख๏Λ࠾༻͍ͯ͠Δ • ୳ࡧۭؒ಺ͷ֤ΦϖϨʔγϣϯΛGPUϝϞϦʹ৐ ͤΔඞཁ͕͋Δҝɺ݁ہGPUϝϞϦͷ࢖༻ྔ͕໰ ୊ͱͳΓɺsample͞Εͨproxy dataset͕ඞཁʹ ͳͬͨΓɺbatch sizeΛ্͛ΒΕͳ͔ͬͨΓ͢Δ Image credit: [7] Image credit: [8]
  12. 36.

    Our approach 36 Single-Path NAS[9] Device SoC Generation (Snapdragon) Model

    ImageNet Top-1 Accuracy* Latency (ms)* A 845 SPNAS 74.48 77.90 A 845 MobileNetV2 71.80 76.36 B 808 SPNAS 73.07 113.92 B 808 MobileNetV2 71.80 162.82 C 670 SPNAS 73.15 92.14 C 670 MobileNetV2 71.80 111.85 D 801 SPNAS 71.93 84.65 D 801 MobileNetV2 71.80 120.82 Image credit: [9] * All results are for float32 • superkenelͱ͍͏୳ࡧۭؒͷઃఆ ख๏Ͱɺ਺ඦʙ਺ઍGPU͔͔࣌ؒ Δ୳ࡧ࣌ؒΛ࡟ݮ͢Δ͜ͱ͕Ͱ͖Δ • MobileNet-V2ͱSingle-Path NAS(SPNAS)Ͱੜ੒ͨ͠Ϟσϧͱ ͷੑೳൺֱ MobileNet-V3Λϕʔεʹ୳ࡧۭؒΛઃఆ͠ɺSPNAS౳ͷϦʔζφϒϧͳNASख๏Ͱ୳ࡧ Λߦ͏ͷ͕ݱ࣮త?
  13. 37.

    Backend System for Edge 37 Training Optimizer Distribute • Architecture

    search • Weight pruning • Dense-Sparse-Dense training • Quantization • K-means cluster • Execution engine selection • Model version management • A/B testing • Distribute model ɾEdgeϓϩμΫτ͸ΫϥΠΞϯταΠυ͚ͩͰ͸੒ཱ͠ͳ͍ ɾRealtime ૒ํ޲ϓϩτίϧ ɾDevice efficient ͳmodelΛੜ੒͢Δ࢓૊Έ ɾmodelΛಈత഑෍͢Δ࢓૊Έ etc..
  14. 38.

    Edge function architecture 38 38 Server side Client side •

    αʔόଆͷDNN frameworkʹ͸ TensorFlowΛ࢖༻ • LykeionʹΑͬͯύΠϓϥΠϯΛ ߏங • ΫϥΠΞϯτଆͰ͸TensorFlow Lite + MediaPipeΛ࢖༻ • MediaPipeΛ࢖༻͢Δ͜ͱͰલॲ ཧ΍ޙॲཧ΋SIMD౳Λ࢖༻ͯ͠ ޮ཰ԽͰ͖Δ (TF Lite) (Optional)
  15. 40.

    Edge AI Technology 40 40 • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ͠UX্େ͖ͳϝϦοτ͕༗Δ • ͔͠͠Model΍Runtimeɺ͞Βʹ͸Backendͱߟྀ͢΂͖ࣄฑ͕ଟ͘ͳΔͷ΋ࣄ࣮ •

    ໨తͷUXΛ࣮ݱ͢ΔͨΊʹɺ Accuracy΍Latency౳ͷόϥϯεΛऔΔඞཁ͕͋Δ • ඞͣ͠΋Accuracy͕࠷༏ઌͰ͸ͳ͍ • ࠓޙҰൠԽ͞ΕΔաఔͰɺҾ͖ଓ͖Runtime΍Modelingख๏ͷٸܹͳਐԽ͕༧૝͞ΕΔ
  16. 41.

    References 41 [1] Lai, Liangzhen, Naveen Suda, and Vikas Chandra.

    "Not all ops are created equal!." arXiv preprint arXiv:1801.04326 (2018). [2] Wu, Carole-Jean, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood et al. "Machine learning at facebook: Understanding inference at the edge." In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 331-344. IEEE, 2019. [3] Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). [4] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520. 2018. [5] Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019). [6] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016). [7] Tan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. "Mnasnet: Platform- aware neural architecture search for mobile." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828. 2019. [8] Wu, Bichen, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734-10742. 2019. [9] Stamoulis, Dimitrios, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. "Single- path nas: Designing hardware-efficient convnets in less than 4 hours." arXiv preprint arXiv:1904.02877 (2019).