Slide 1

Slide 1 text

ϝϧΧϦͷࣸਅݕࡧΛࢧ͑ΔόοΫΤϯυͱ ϓϩμΫτʹ͓͚ΔEdgeAI technologyͷల๬ 1 גࣜձࣾϝϧΧϦ தՏ ޺จ

Slide 2

Slide 2 text

தՏ ޺จ(Nakagawa Hirofumi) • ෋ࢁݝग़਎ • ϝϧΧϦ΁͸2017೥7݄ೖࣾ ex-MIRACLE LINUX, ex-Cerevo, ex-mixi, ex-Drivemode(co-founder) • ॴଐ͸SRE→AI/MLνʔϜ • σόΠευϥΠό։ൃ͔ΒϑϩϯτΤϯυ։ൃ·Ͱ΍ ΔԿͰ΋԰ɻ࠷ۙͰ͸MLͱRobotics͕ओઓ৔ͱͳΓ ͭͭ͋Δɻ Twitter: hnakagawa14 GitHub: hnakagawa 2 ࣗݾ঺հ

Slide 3

Slide 3 text

Introduction 3

Slide 4

Slide 4 text

What is ࣸਅݕࡧ • ࣸਅݕࡧͱ͸ɺ͍ΘΏΔը૾ݕࡧػೳ • ΞϓϦ͔ΒࣸਅΛݩʹ঎඼Λݕࡧ͢Δ • ঎඼໊Λ஌Βͳͯ͘΋ը૾͔Β঎඼Λݕࡧ Ͱ͖Δ 4 ಈըϦϯΫ: https://youtu.be/kTni8EvOCgI

Slide 5

Slide 5 text

جຊతͳࣸਅݕࡧͷ࢓૊Έ 5 Deep Neural Networks (DNN)Λ࢖༻ͯ͠঎඼ը૾ ͔Βಛ௃ϕΫτϧΛऔಘ औಘͨ͠ಛ௃ϕΫτϧΛ Approximate Nearest Neighbor Index(ANN Index) ʹ௥Ճͯ͠ը૾indexΛߏங ݕࡧ࣌ʹ͸ಉ͘͡঎඼ը૾͔Β DNNΛհͯ͠ಛ௃ϕΫτϧΛऔ ಘ͠ɺANN Index͔Βݕࡧ 2 3 1

Slide 6

Slide 6 text

What is Kubernetes • KubernetesʢҎԼk8sʣͱ͸Φʔϓϯιʔε ͷίϯςφɾΦʔέετϨʔγϣϯγες Ϝ • k8sʹ͸Custom Resource Definitionͱݺ͹ ΕΔಠࣗͷϦιʔεΛఆٛͰ͖Δػೳ͕͋ Γɺ։ൃऀ͸ͦͷػೳΛհͯ͠k8sͷػೳΛ ֦ுͰ͖Δ • Amazon Elastic Container Service for Kubernetes (Amazon EKS) ͱ͸k8sͷϚω ʔδυɾαʔϏεɺίϯτϩʔϧϓϨʔϯ ͷ؅ཧΛߦͬͯ͘ΕΔ 6

Slide 7

Slide 7 text

What is Custom Resource Definition • Custom Resource DefinitionʢҎԼCRDʣͱ ͸ಠࣗʹϦιʔεΛఆٛͰ͖Δk8sͷػೳ • CRDɾϦιʔεͱɺΧελϜɾίϯτϩʔ ϥͰߏ੒͞ΕΔ • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε ͷϥΠϑαΠΫϧ/ঢ়ଶʹԠͯ͡Ϋϥελͷ ঢ়ଶΛίϯτϩʔϧ͢Δ 7

Slide 8

Slide 8 text

ML Platform Lykeion ࣸਅݕࡧ͸Lykeionͱݺ͹ΕΔ಺੡ͷML Platform্ ʹߏங͞Ε͓ͯΓɺԼهͷػೳ͸Platformଆͷػೳ Λ࢖༻͍ͯ͠Δ 8 • Training/Serving CRD & ΧελϜίϯτϩʔϥ • ίϯςφϕʔεɾύΠϓϥΠϯ • Training/Serving ίϯςφΠϝʔδɾϏϧμʔ • ϞσϧɾϨϙδτϦ

Slide 9

Slide 9 text

Architecture 9

Slide 10

Slide 10 text

Architecture֓ཁਤ 10

Slide 11

Slide 11 text

1.TrainingɾϦιʔεͷ࡞੒ 11

Slide 12

Slide 12 text

TrainingɾϦιʔεͷ࡞੒ • Training custom resourceΛCronJob͕࡞੒ • ΧελϜɾίϯτϩʔϥ͕CRDɾϦιʔε Ͱઃఆ͞ΕͨίϯςφϕʔεɾύΠϓϥΠ ϯΛ࣮ߦ • ࣮ߦ͢Δόον୯Ґͱͯ͠͸Hourly, Daily, Monthly͕ଘࡏ 12

Slide 13

Slide 13 text

ίϯςφϕʔεɾύΠϓϥΠϯ • ֤޻ఔΛݸผͷίϯςφɾΠϝʔδͰ࣮ߦ • ϥΠϒϥϦͷґଘؔ܎ͳͲ؀ڥφΠʔϒͳMLύΠϓϥΠϯͷ໰୊Λղܾ • ύΠϓϥΠϯDAG͸YAMLͰهड़ • ֤޻ఔͷೖग़ྗ͸Persistent VolumeʢҎԼPVʣΛհ͢ 13

Slide 14

Slide 14 text

Batch Execution as Custom Resource • શͯͷόον࣮ߦ৘ใ͕CRDɾϦιʔεͱͯ͠ k8s্ʹ࢒Δ • ಉ͡ॲཧΛ࠶࣮ߦग़དྷΔͨΊɺόονͷ࠶࣮ߦ Λ൐͏ো֐෮چ࡞ۀ͕༰қ 14

Slide 15

Slide 15 text

2.ը૾ͷμ΢ϯϩʔυ 15

Slide 16

Slide 16 text

ը૾ͷμ΢ϯϩʔυ • S3্ʹଘࡏ͢ΔϝϧΧϦɾΠϝʔδετΞ͔Β঎඼ը૾Λμ΢ϯϩʔυ • ύΠϓϥΠϯ্΋ͬͱ΋͕͔͔࣌ؒΔ޻ఔʢը૾਺͕๲େͳͨΊ) • ͦͷͨΊPVʹҰఆظؒΩϟογϡ͢ΔࣄʹΑͬͯ࠶ΠϯσοΫε͕ ඞཁͳ࣌ʹ͸ૉૣ͘ύΠϓϥΠϯΛճͤΔΑ͏ʹ͍ͯ͠Δ 16

Slide 17

Slide 17 text

3.ΞηοτͷΞοϓϩʔυ 17

Slide 18

Slide 18 text

ΞηοτͷΞοϓϩʔυ • ύΠϓϥΠϯͷ੒Ռ෺Ͱ͋Δಛ௃ϕΫτϧͱANN IndexΛϞσϧɾϨϙδτϦʹอଘ • શͯͷ੒Ռ෺͸όʔδϣϯ؅ཧ͞Εͨঢ়ଶͰอଘ͞ΕΔ • ϞσϧɾϨϙδτϦ͸GCS্ʹߏங 18

Slide 19

Slide 19 text

4.ServingΠϝʔδͷϏϧυ 19

Slide 20

Slide 20 text

ServingΠϝʔδͷϏϧυ 1. ϞσϧɾϨϙδτϦΛImage Builderͱݺ͹ΕΔdaemon͕؂ࢹ 2. ৽͍͠Serving͢΂͖Ϧιʔε͕௥Ճ͞ΕΔͱࣗಈͰServingίϯςφɾΠϝʔδΛϏϧυ • ίϯςφɾΠϝʔδ͸શͯͷANN Index౳ͷαʔϏϯάʹඞཁͳϦιʔεΛશؚͯΜͰ͍Δ 3. Ϗϧυ͞ΕͨίϯςφɾΠϝʔδΛίϯςφɾϨδετϦʹϓογϡ 20

Slide 21

Slide 21 text

5.ServingɾϦιʔεͷ࡞੒ 21

Slide 22

Slide 22 text

ServingɾϦιʔεͷ࡞੒ • Image Builder͸ίϯςφɾΠϝʔδΛϏ ϧυͨ͋͠ͱɺServingΧελϜɾϦιʔ εΛ࡞੒ • ServingΧελϜɾίϯτϩʔϥ͸CRDɾ ϦιʔεͷઃఆΛݩʹඞཁͳ DeploymentɺService౳Λ࡞੒ • ຊγεςϜͰ͸ߏங͞ΕͨANN IndexΛ ݸผͷIndexαʔϏεͱͯ͠σϓϩΠ 22

Slide 23

Slide 23 text

6.αʔϏεɾσΟεΧόϦ 23

Slide 24

Slide 24 text

αʔϏεɾσΟεΧόϦ • Ϋϥελ্ʹଘࡏ͢ΔIndexαʔϏεΛ k8sΛհͯࣗ͠ಈతʹऔಘ͢Δ • ͳΔ΂͘େ͖ͳཻ౓ͷIndexΛ࢖༻͢ΔΑ ͏ɺҟͳΔظؒɾཻ౓ͷIndexαʔϏε (Hourly, Daily, Monthly) Λࣗಈతʹ૊Έ߹ ΘͤΔ • REST <-> IndexαʔϏεؒͷϓϩτίϧ ͸gRPCΛ࢖༻ 24

Slide 25

Slide 25 text

֓ཁਤͷৼΓฦΓ 25

Slide 26

Slide 26 text

Conclusion 26

Slide 27

Slide 27 text

ࣸਅݕࡧͷόοΫΤϯυɾΠϯϑϥ 1. ίϯςφɾϕʔεͷ࠶ݱੑͷߴ͍γεςϜ 2. k8sͷCRD/ΧελϜɾίϯτϩʔϥ΍αʔϏεɾσΟεΧόϦ౳ͷػೳΛ׆༻ 3. Batch Execution as Custom Resource౳ɺML PlatformͰ࣮ݱ͞Ε͍ͯΔػೳΛ࢖༻ ͠ɺϩόετͳγεςϜΛߏங 4. Ϋϥ΢υɾΠϯϑϥΛk8sͰந৅Խ͢ΔࣄʹΑͬͯɺ֤Ϋϥ΢υɾϕϯμͷྑ͍ͱ͜औΓ Λ͍ͯ͠Δ 27

Slide 28

Slide 28 text

Next Future 28

Slide 29

Slide 29 text

Realtime image search • ॴҦ Edge AI TechnologyΛ࢖༻ͯ͠ɺݕࡧʹඞཁͳਪ ࿦ॲཧͷେ෦෼ΛEdgeଆͰߦ͍ͬͯΔ • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ • UX্େ͖ͳϝϦοτ͕༗Δ 29

Slide 30

Slide 30 text

Listing Dispacher • ద੾ͳग़඼ϝιουΛαδΣετͯ͘͠ΕΔ • ෳࡶͳग़඼ϑϩʔΛ؆ུԽ • ࠷ऴతʹ͸͔͚ͩ͟͢Ͱग़඼͕׬ྃ͢ΔॴΛ໨ࢦ͢!! 30

Slide 31

Slide 31 text

What must happen to make DNN work on edge • ༷ʑͳτϨʔυΦϑ໰୊͕ଘࡏ͢Δ • Accuracy • Latency • Energy consumption • Model size • ໨తͷUXΛୡ੒͢ΔͨΊʹɺΞϧΰϦ ζϜɺΤϯδχΞϦϯά྆ํͰͦΕΒͷ όϥϯεΛߟྀ͢Δඞཁ͕͋Δ 31 Image credit: [1] Image credit: [1] Image credit: [2] ɾΦϖϨʔγϣϯʹΑͬͯίετ͕ҧ͏[1] ɾmobile deviceͰαϙʔτ͍ͯ͠ΔGPUΠϯλʔϑΣʔεͷγΣΞ[2]ɹ

Slide 32

Slide 32 text

Landscape of execution environment Image credit: [2] Image credit: [2] ※ FacebookͷϨϙʔτ͔ΒͷҾ༻[2] ೗Կʹ໨తͷUXΛ࣮ݱͰ͖ΔσόΠεͷΧόϨοδΛ޿͛Δ͔? 32

Slide 33

Slide 33 text

Designing efficient networks: Manual efforts - Mobile Nets V1, V2 and V3([4], [5], & [6]) • Depthwise separable conv Λ࢖༻͠ܭࢉྔΛ ௿ݮ • Inverted residual with linear bottleneck Λ࢖༻͠ ϝϞϦΞΫηεྔΛ ௿ݮ Image credit: [4] Image credit: [5] Image credit: [6] • ׆ੑԽؔ਺ʹh-swishΛ࢖༻ • squeeze & excitationΛ࢖ ༻ͨ͠channelͷAttention • ܰྔͳfinal blockͷ࠾༻ etc... 33

Slide 34

Slide 34 text

Designing efficient networks: Automated ways - ௨ৗͷϞσϧͷτϨʔχϯά ΛτϨʔχϯάɾύϥϝʔλͱͨ͠Ϟσϧ Λೖྗɺ Λग़ྗͱͯ͠ɺ Λ࠷খԽ͢Δ ୳ࡧ͢Δ - ΞʔΩςΫνϟɾαʔνͰͷ୳ࡧ&τϨʔχϯά ΞʔΩςΫνϟɾύϥϝʔλ ௥Ճ Λ࠷খԽ͢Δ Λ୳ࡧ͢Δ ͱ 34

Slide 35

Slide 35 text

Two influential yet costly approaches 35 MnasNet[7] (RL-Based) FBNet[8] (Differentiable) • ୳ࡧۭ͔ؒΒ਺ઍͷmodelΛsampling͢Δ • sample͞Εͨchild modelΛεΫϥον͔Β τϨʔχϯά͢Δ • ڊେͳ୳ࡧۭ͔ؒΒ୳ࡧͰ͖Δ͕ɺݱ࣮తͳ ΠςϨʔγϣϯΛߦ͏ҝʹɺڊେͳܭࢉػϦ ιʔε͕ඞཁʹͳΔ • DARTSϕʔεͷ୳ࡧख๏Λ࠾༻͍ͯ͠Δ • ୳ࡧۭؒ಺ͷ֤ΦϖϨʔγϣϯΛGPUϝϞϦʹ৐ ͤΔඞཁ͕͋Δҝɺ݁ہGPUϝϞϦͷ࢖༻ྔ͕໰ ୊ͱͳΓɺsample͞Εͨproxy dataset͕ඞཁʹ ͳͬͨΓɺbatch sizeΛ্͛ΒΕͳ͔ͬͨΓ͢Δ Image credit: [7] Image credit: [8]

Slide 36

Slide 36 text

Our approach 36 Single-Path NAS[9] Device SoC Generation (Snapdragon) Model ImageNet Top-1 Accuracy* Latency (ms)* A 845 SPNAS 74.48 77.90 A 845 MobileNetV2 71.80 76.36 B 808 SPNAS 73.07 113.92 B 808 MobileNetV2 71.80 162.82 C 670 SPNAS 73.15 92.14 C 670 MobileNetV2 71.80 111.85 D 801 SPNAS 71.93 84.65 D 801 MobileNetV2 71.80 120.82 Image credit: [9] * All results are for float32 • superkenelͱ͍͏୳ࡧۭؒͷઃఆ ख๏Ͱɺ਺ඦʙ਺ઍGPU͔͔࣌ؒ Δ୳ࡧ࣌ؒΛ࡟ݮ͢Δ͜ͱ͕Ͱ͖Δ • MobileNet-V2ͱSingle-Path NAS(SPNAS)Ͱੜ੒ͨ͠Ϟσϧͱ ͷੑೳൺֱ MobileNet-V3Λϕʔεʹ୳ࡧۭؒΛઃఆ͠ɺSPNAS౳ͷϦʔζφϒϧͳNASख๏Ͱ୳ࡧ Λߦ͏ͷ͕ݱ࣮త?

Slide 37

Slide 37 text

Backend System for Edge 37 Training Optimizer Distribute ● Architecture search ● Weight pruning ● Dense-Sparse-Dense training ● Quantization ● K-means cluster ● Execution engine selection ● Model version management ● A/B testing ● Distribute model ɾEdgeϓϩμΫτ͸ΫϥΠΞϯταΠυ͚ͩͰ͸੒ཱ͠ͳ͍ ɾRealtime ૒ํ޲ϓϩτίϧ ɾDevice efficient ͳmodelΛੜ੒͢Δ࢓૊Έ ɾmodelΛಈత഑෍͢Δ࢓૊Έ etc..

Slide 38

Slide 38 text

Edge function architecture 38 38 Server side Client side • αʔόଆͷDNN frameworkʹ͸ TensorFlowΛ࢖༻ • LykeionʹΑͬͯύΠϓϥΠϯΛ ߏங • ΫϥΠΞϯτଆͰ͸TensorFlow Lite + MediaPipeΛ࢖༻ • MediaPipeΛ࢖༻͢Δ͜ͱͰલॲ ཧ΍ޙॲཧ΋SIMD౳Λ࢖༻ͯ͠ ޮ཰ԽͰ͖Δ (TF Lite) (Optional)

Slide 39

Slide 39 text

Conclusion Again 39

Slide 40

Slide 40 text

Edge AI Technology 40 40 • ϦΞϧλΠϜͳΠϯλϥΫγϣϯΛ࣮ݱ͠UX্େ͖ͳϝϦοτ͕༗Δ • ͔͠͠Model΍Runtimeɺ͞Βʹ͸Backendͱߟྀ͢΂͖ࣄฑ͕ଟ͘ͳΔͷ΋ࣄ࣮ • ໨తͷUXΛ࣮ݱ͢ΔͨΊʹɺ Accuracy΍Latency౳ͷόϥϯεΛऔΔඞཁ͕͋Δ • ඞͣ͠΋Accuracy͕࠷༏ઌͰ͸ͳ͍ • ࠓޙҰൠԽ͞ΕΔաఔͰɺҾ͖ଓ͖Runtime΍Modelingख๏ͷٸܹͳਐԽ͕༧૝͞ΕΔ

Slide 41

Slide 41 text

References 41 [1] Lai, Liangzhen, Naveen Suda, and Vikas Chandra. "Not all ops are created equal!." arXiv preprint arXiv:1801.04326 (2018). [2] Wu, Carole-Jean, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood et al. "Machine learning at facebook: Understanding inference at the edge." In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 331-344. IEEE, 2019. [3] Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). [4] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520. 2018. [5] Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019). [6] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016). [7] Tan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. "Mnasnet: Platform- aware neural architecture search for mobile." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828. 2019. [8] Wu, Bichen, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734-10742. 2019. [9] Stamoulis, Dimitrios, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. "Single- path nas: Designing hardware-efficient convnets in less than 4 hours." arXiv preprint arXiv:1904.02877 (2019).

Slide 42

Slide 42 text

Thank you all for coming today 42