Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kanazawa_AI.pdf

 Kanazawa_AI.pdf

More Decks by Hirofumi Nakagawa/中河 宏文

Other Decks in Programming

Transcript

  1. ϝϧΧϦͷࣸਅݕࡧΛࢧ͑ΔόοΫΤϯυͱ
    ϓϩμΫτʹ͓͚ΔEdgeAI technologyͷల๬
    1
    גࣜձࣾϝϧΧϦ தՏ ޺จ

    View Slide

  2. தՏ ޺จ(Nakagawa Hirofumi)
    • ෋ࢁݝग़਎
    • ϝϧΧϦ΁͸2017೥7݄ೖࣾ
    ex-MIRACLE LINUX, ex-Cerevo,
    ex-mixi, ex-Drivemode(co-founder)
    • ॴଐ͸SRE→AI/MLνʔϜ
    • σόΠευϥΠό։ൃ͔ΒϑϩϯτΤϯυ։ൃ·Ͱ΍
    ΔԿͰ΋԰ɻ࠷ۙͰ͸MLͱRobotics͕ओઓ৔ͱͳΓ
    ͭͭ͋Δɻ
    Twitter: hnakagawa14
    GitHub: hnakagawa
    2
    ࣗݾ঺հ

    View Slide

  3. Introduction
    3

    View Slide

  4. What is ࣸਅݕࡧ
    • ࣸਅݕࡧͱ͸ɺ͍ΘΏΔը૾ݕࡧػೳ
    • ΞϓϦ͔ΒࣸਅΛݩʹ঎඼Λݕࡧ͢Δ
    • ঎඼໊Λ஌Βͳͯ͘΋ը૾͔Β঎඼Λݕࡧ
    Ͱ͖Δ
    4
    ಈըϦϯΫ: https://youtu.be/kTni8EvOCgI

    View Slide

  5. جຊతͳࣸਅݕࡧͷ࢓૊Έ
    5
    Deep Neural Networks
    (DNN)Λ࢖༻ͯ͠঎඼ը૾
    ͔Βಛ௃ϕΫτϧΛऔಘ
    औಘͨ͠ಛ௃ϕΫτϧΛ
    Approximate Nearest
    Neighbor Index(ANN Index)
    ʹ௥Ճͯ͠ը૾indexΛߏங
    ݕࡧ࣌ʹ͸ಉ͘͡঎඼ը૾͔Β
    DNNΛհͯ͠ಛ௃ϕΫτϧΛऔ
    ಘ͠ɺANN Index͔Βݕࡧ
    2 3
    1

    View Slide

  6. What is Kubernetes
    • KubernetesʢҎԼk8sʣͱ͸Φʔϓϯιʔε
    ͷίϯςφɾΦʔέετϨʔγϣϯγες
    Ϝ
    • k8sʹ͸Custom Resource Definitionͱݺ͹
    ΕΔಠࣗͷϦιʔεΛఆٛͰ͖Δػೳ͕͋
    Γɺ։ൃऀ͸ͦͷػೳΛհͯ͠k8sͷػೳΛ
    ֦ுͰ͖Δ
    • Amazon Elastic Container Service for
    Kubernetes (Amazon EKS) ͱ͸k8sͷϚω
    ʔδυɾαʔϏεɺίϯτϩʔϧϓϨʔϯ
    ͷ؅ཧΛߦͬͯ͘ΕΔ
    6

    View Slide

  7. What is Custom Resource Definition
    • Custom Resource DefinitionʢҎԼCRDʣͱ
    ͸ಠࣗʹϦιʔεΛఆٛͰ͖Δk8sͷػೳ
    • CRDɾϦιʔεͱɺΧελϜɾίϯτϩʔ
    ϥͰߏ੒͞ΕΔ
    • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε
    ͷϥΠϑαΠΫϧ/ঢ়ଶʹԠͯ͡Ϋϥελͷ
    ঢ়ଶΛίϯτϩʔϧ͢Δ
    7

    View Slide

  8. ML Platform Lykeion
    ࣸਅݕࡧ͸Lykeionͱݺ͹ΕΔ಺੡ͷML Platform্
    ʹߏங͞Ε͓ͯΓɺԼهͷػೳ͸Platformଆͷػೳ
    Λ࢖༻͍ͯ͠Δ
    8
    • Training/Serving CRD & ΧελϜίϯτϩʔϥ
    • ίϯςφϕʔεɾύΠϓϥΠϯ
    • Training/Serving ίϯςφΠϝʔδɾϏϧμʔ
    • ϞσϧɾϨϙδτϦ

    View Slide

  9. Architecture
    9

    View Slide

  10. Architecture֓ཁਤ
    10

    View Slide

  11. 1.TrainingɾϦιʔεͷ࡞੒
    11

    View Slide

  12. TrainingɾϦιʔεͷ࡞੒
    • Training custom resourceΛCronJob͕࡞੒
    • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε
    Ͱઃఆ͞ΕͨίϯςφϕʔεɾύΠϓϥΠ
    ϯΛ࣮ߦ
    • ࣮ߦ͢Δόον୯Ґͱͯ͠͸Hourly, Daily,
    Monthly͕ଘࡏ
    12

    View Slide

  13. ίϯςφϕʔεɾύΠϓϥΠϯ
    • ֤޻ఔΛݸผͷίϯςφɾΠϝʔδͰ࣮ߦ
    • ϥΠϒϥϦͷґଘؔ܎ͳͲ؀ڥφΠʔϒͳMLύΠϓϥΠϯͷ໰୊Λղܾ
    • ύΠϓϥΠϯDAG͸YAMLͰهड़
    • ֤޻ఔͷೖग़ྗ͸Persistent VolumeʢҎԼPVʣΛհ͢
    13

    View Slide

  14. Batch Execution as Custom Resource
    • શͯͷόον࣮ߦ৘ใ͕CRDɾϦιʔεͱͯ͠
    k8s্ʹ࢒Δ
    • ಉ͡ॲཧΛ࠶࣮ߦग़དྷΔͨΊɺόονͷ࠶࣮ߦ
    Λ൐͏ো֐෮چ࡞ۀ͕༰қ
    14

    View Slide

  15. 2.ը૾ͷμ΢ϯϩʔυ
    15

    View Slide

  16. ը૾ͷμ΢ϯϩʔυ
    • S3্ʹଘࡏ͢ΔϝϧΧϦɾΠϝʔδετΞ͔Β঎඼ը૾Λμ΢ϯϩʔυ
    • ύΠϓϥΠϯ্΋ͬͱ΋͕͔͔࣌ؒΔ޻ఔʢը૾਺͕๲େͳͨΊ)
    • ͦͷͨΊPVʹҰఆظؒΩϟογϡ͢ΔࣄʹΑͬͯ࠶ΠϯσοΫε͕
    ඞཁͳ࣌ʹ͸ૉૣ͘ύΠϓϥΠϯΛճͤΔΑ͏ʹ͍ͯ͠Δ
    16

    View Slide

  17. 3.ΞηοτͷΞοϓϩʔυ
    17

    View Slide

  18. ΞηοτͷΞοϓϩʔυ
    • ύΠϓϥΠϯͷ੒Ռ෺Ͱ͋Δಛ௃ϕΫτϧͱANN IndexΛϞσϧɾϨϙδτϦʹอଘ
    • શͯͷ੒Ռ෺͸όʔδϣϯ؅ཧ͞Εͨঢ়ଶͰอଘ͞ΕΔ
    • ϞσϧɾϨϙδτϦ͸GCS্ʹߏங
    18

    View Slide

  19. 4.ServingΠϝʔδͷϏϧυ
    19

    View Slide

  20. ServingΠϝʔδͷϏϧυ
    1. ϞσϧɾϨϙδτϦΛImage Builderͱݺ͹ΕΔdaemon͕؂ࢹ
    2. ৽͍͠Serving͢΂͖Ϧιʔε͕௥Ճ͞ΕΔͱࣗಈͰServingίϯςφɾΠϝʔδΛϏϧυ
    • ίϯςφɾΠϝʔδ͸શͯͷANN Index౳ͷαʔϏϯάʹඞཁͳϦιʔεΛશؚͯΜͰ͍Δ
    3. Ϗϧυ͞ΕͨίϯςφɾΠϝʔδΛίϯςφɾϨδετϦʹϓογϡ
    20

    View Slide

  21. 5.ServingɾϦιʔεͷ࡞੒
    21

    View Slide

  22. ServingɾϦιʔεͷ࡞੒
    • Image Builder͸ίϯςφɾΠϝʔδΛϏ
    ϧυͨ͋͠ͱɺServingΧελϜɾϦιʔ
    εΛ࡞੒
    • ServingΧελϜɾίϯτϩʔϥ͸CRDɾ
    ϦιʔεͷઃఆΛݩʹඞཁͳ
    DeploymentɺService౳Λ࡞੒
    • ຊγεςϜͰ͸ߏங͞ΕͨANN IndexΛ
    ݸผͷIndexαʔϏεͱͯ͠σϓϩΠ
    22

    View Slide

  23. 6.αʔϏεɾσΟεΧόϦ
    23

    View Slide

  24. αʔϏεɾσΟεΧόϦ
    • Ϋϥελ্ʹଘࡏ͢ΔIndexαʔϏεΛ
    k8sΛհͯࣗ͠ಈతʹऔಘ͢Δ
    • ͳΔ΂͘େ͖ͳཻ౓ͷIndexΛ࢖༻͢ΔΑ
    ͏ɺҟͳΔظؒɾཻ౓ͷIndexαʔϏε
    (Hourly, Daily, Monthly) Λࣗಈతʹ૊Έ߹
    ΘͤΔ
    • REST <-> IndexαʔϏεؒͷϓϩτίϧ
    ͸gRPCΛ࢖༻
    24

    View Slide

  25. ֓ཁਤͷৼΓฦΓ
    25

    View Slide

  26. Conclusion
    26

    View Slide

  27. ࣸਅݕࡧͷόοΫΤϯυɾΠϯϑϥ
    1. ίϯςφɾϕʔεͷ࠶ݱੑͷߴ͍γεςϜ
    2. k8sͷCRD/ΧελϜɾίϯτϩʔϥ΍αʔϏεɾσΟεΧόϦ౳ͷػೳΛ׆༻
    3. Batch Execution as Custom Resource౳ɺML PlatformͰ࣮ݱ͞Ε͍ͯΔػೳΛ࢖༻
    ͠ɺϩόετͳγεςϜΛߏங
    4. Ϋϥ΢υɾΠϯϑϥΛk8sͰந৅Խ͢ΔࣄʹΑͬͯɺ֤Ϋϥ΢υɾϕϯμͷྑ͍ͱ͜औΓ
    Λ͍ͯ͠Δ
    27

    View Slide

  28. Next Future
    28

    View Slide

  29. Realtime image search
    • ॴҦ Edge AI TechnologyΛ࢖༻ͯ͠ɺݕࡧʹඞཁͳਪ
    ࿦ॲཧͷେ෦෼ΛEdgeଆͰߦ͍ͬͯΔ
    • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ
    • UX্େ͖ͳϝϦοτ͕༗Δ
    29

    View Slide

  30. Listing Dispacher
    • ద੾ͳग़඼ϝιουΛαδΣετͯ͘͠ΕΔ
    • ෳࡶͳग़඼ϑϩʔΛ؆ུԽ
    • ࠷ऴతʹ͸͔͚ͩ͟͢Ͱग़඼͕׬ྃ͢ΔॴΛ໨ࢦ͢!!
    30

    View Slide

  31. What must happen to make DNN work on edge
    • ༷ʑͳτϨʔυΦϑ໰୊͕ଘࡏ͢Δ
    • Accuracy
    • Latency
    • Energy consumption
    • Model size
    • ໨తͷUXΛୡ੒͢ΔͨΊʹɺΞϧΰϦ
    ζϜɺΤϯδχΞϦϯά྆ํͰͦΕΒͷ
    όϥϯεΛߟྀ͢Δඞཁ͕͋Δ
    31
    Image credit: [1] Image credit: [1]
    Image credit: [2]
    ɾΦϖϨʔγϣϯʹΑͬͯίετ͕ҧ͏[1]
    ɾmobile deviceͰαϙʔτ͍ͯ͠ΔGPUΠϯλʔϑΣʔεͷγΣΞ[2]ɹ

    View Slide

  32. Landscape of execution environment
    Image credit: [2] Image credit: [2]
    ※ FacebookͷϨϙʔτ͔ΒͷҾ༻[2]
    ೗Կʹ໨తͷUXΛ࣮ݱͰ͖ΔσόΠεͷΧόϨοδΛ޿͛Δ͔?
    32

    View Slide

  33. Designing efficient networks: Manual efforts
    - Mobile Nets V1, V2 and V3([4], [5], & [6])
    • Depthwise separable
    conv Λ࢖༻͠ܭࢉྔΛ
    ௿ݮ
    • Inverted residual
    with linear
    bottleneck Λ࢖༻͠
    ϝϞϦΞΫηεྔΛ
    ௿ݮ
    Image credit: [4] Image credit: [5] Image credit: [6]
    • ׆ੑԽؔ਺ʹh-swishΛ࢖༻
    • squeeze & excitationΛ࢖
    ༻ͨ͠channelͷAttention
    • ܰྔͳfinal blockͷ࠾༻
    etc...
    33

    View Slide

  34. Designing efficient networks: Automated ways
    - ௨ৗͷϞσϧͷτϨʔχϯά
    ΛτϨʔχϯάɾύϥϝʔλͱͨ͠Ϟσϧ Λೖྗɺ Λग़ྗͱͯ͠ɺ Λ࠷খԽ͢Δ ୳ࡧ͢Δ
    - ΞʔΩςΫνϟɾαʔνͰͷ୳ࡧ&τϨʔχϯά
    ΞʔΩςΫνϟɾύϥϝʔλ ௥Ճ Λ࠷খԽ͢Δ Λ୳ࡧ͢Δ
    ͱ
    34

    View Slide

  35. Two influential yet costly approaches
    35
    MnasNet[7] (RL-Based) FBNet[8]
    (Differentiable)
    • ୳ࡧۭ͔ؒΒ਺ઍͷmodelΛsampling͢Δ
    • sample͞Εͨchild modelΛεΫϥον͔Β
    τϨʔχϯά͢Δ
    • ڊେͳ୳ࡧۭ͔ؒΒ୳ࡧͰ͖Δ͕ɺݱ࣮తͳ
    ΠςϨʔγϣϯΛߦ͏ҝʹɺڊେͳܭࢉػϦ
    ιʔε͕ඞཁʹͳΔ
    • DARTSϕʔεͷ୳ࡧख๏Λ࠾༻͍ͯ͠Δ
    • ୳ࡧۭؒ಺ͷ֤ΦϖϨʔγϣϯΛGPUϝϞϦʹ৐
    ͤΔඞཁ͕͋Δҝɺ݁ہGPUϝϞϦͷ࢖༻ྔ͕໰
    ୊ͱͳΓɺsample͞Εͨproxy dataset͕ඞཁʹ
    ͳͬͨΓɺbatch sizeΛ্͛ΒΕͳ͔ͬͨΓ͢Δ
    Image credit: [7] Image credit: [8]

    View Slide

  36. Our approach
    36
    Single-Path NAS[9]
    Device
    SoC Generation
    (Snapdragon) Model
    ImageNet Top-1
    Accuracy* Latency (ms)*
    A 845 SPNAS 74.48 77.90
    A 845 MobileNetV2 71.80 76.36
    B 808 SPNAS 73.07 113.92
    B 808 MobileNetV2 71.80 162.82
    C 670 SPNAS 73.15 92.14
    C 670 MobileNetV2 71.80 111.85
    D 801 SPNAS 71.93 84.65
    D 801 MobileNetV2 71.80 120.82
    Image credit: [9]
    * All results are for float32
    • superkenelͱ͍͏୳ࡧۭؒͷઃఆ
    ख๏Ͱɺ਺ඦʙ਺ઍGPU͔͔࣌ؒ
    Δ୳ࡧ࣌ؒΛ࡟ݮ͢Δ͜ͱ͕Ͱ͖Δ
    • MobileNet-V2ͱSingle-Path
    NAS(SPNAS)Ͱੜ੒ͨ͠Ϟσϧͱ
    ͷੑೳൺֱ
    MobileNet-V3Λϕʔεʹ୳ࡧۭؒΛઃఆ͠ɺSPNAS౳ͷϦʔζφϒϧͳNASख๏Ͱ୳ࡧ
    Λߦ͏ͷ͕ݱ࣮త?

    View Slide

  37. Backend System for Edge
    37
    Training Optimizer Distribute
    ● Architecture search
    ● Weight pruning
    ● Dense-Sparse-Dense
    training
    ● Quantization
    ● K-means cluster
    ● Execution engine
    selection
    ● Model version
    management
    ● A/B testing
    ● Distribute model
    ɾEdgeϓϩμΫτ͸ΫϥΠΞϯταΠυ͚ͩͰ͸੒ཱ͠ͳ͍
    ɾRealtime ૒ํ޲ϓϩτίϧ
    ɾDevice efficient ͳmodelΛੜ੒͢Δ࢓૊Έ
    ɾmodelΛಈత഑෍͢Δ࢓૊Έ etc..

    View Slide

  38. Edge function architecture
    38
    38
    Server side Client side
    • αʔόଆͷDNN frameworkʹ͸
    TensorFlowΛ࢖༻
    • LykeionʹΑͬͯύΠϓϥΠϯΛ
    ߏங
    • ΫϥΠΞϯτଆͰ͸TensorFlow
    Lite + MediaPipeΛ࢖༻
    • MediaPipeΛ࢖༻͢Δ͜ͱͰલॲ
    ཧ΍ޙॲཧ΋SIMD౳Λ࢖༻ͯ͠
    ޮ཰ԽͰ͖Δ
    (TF Lite)
    (Optional)

    View Slide

  39. Conclusion Again
    39

    View Slide

  40. Edge AI Technology
    40
    40
    • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ͠UX্େ͖ͳϝϦοτ͕༗Δ
    • ͔͠͠Model΍Runtimeɺ͞Βʹ͸Backendͱߟྀ͢΂͖ࣄฑ͕ଟ͘ͳΔͷ΋ࣄ࣮
    • ໨తͷUXΛ࣮ݱ͢ΔͨΊʹɺ Accuracy΍Latency౳ͷόϥϯεΛऔΔඞཁ͕͋Δ
    • ඞͣ͠΋Accuracy͕࠷༏ઌͰ͸ͳ͍
    • ࠓޙҰൠԽ͞ΕΔաఔͰɺҾ͖ଓ͖Runtime΍Modelingख๏ͷٸܹͳਐԽ͕༧૝͞ΕΔ

    View Slide

  41. References
    41
    [1] Lai, Liangzhen, Naveen Suda, and Vikas Chandra. "Not all ops are created equal!." arXiv preprint arXiv:1801.04326 (2018).
    [2] Wu, Carole-Jean, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood et al. "Machine
    learning at facebook: Understanding inference at the edge." In 2019 IEEE International Symposium on High Performance Computer
    Architecture (HPCA), pp. 331-344. IEEE, 2019.
    [3] Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and
    Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861
    (2017).
    [4] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and
    linear bottlenecks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520. 2018.
    [5] Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for
    mobilenetv3." arXiv preprint arXiv:1905.02244 (2019).
    [6] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
    [7] Tan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. "Mnasnet: Platform-
    aware neural architecture search for mobile." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 2820-2828. 2019.
    [8] Wu, Bichen, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and
    Kurt Keutzer. "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search." In Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734-10742. 2019.
    [9] Stamoulis, Dimitrios, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. "Single-
    path nas: Designing hardware-efficient convnets in less than 4 hours." arXiv preprint arXiv:1904.02877 (2019).

    View Slide

  42. Thank you all for coming today
    42

    View Slide