Upgrade to Pro — share decks privately, control downloads, hide ads and more …

モデルアーキテクチャ観点からの高速化2019

yu4u
September 27, 2019

 モデルアーキテクチャ観点からの高速化2019

第二回 Deep Learning Acceleration 勉強会(DLAccel #2)
での発表資料
https://idein.connpass.com/event/139074/

高速化技術を下記の6観点で紹介
- 畳み込みの分解 (Factorization)
- 枝刈り (Pruning)
- アーキテクチャ探索 (Neural Architecture Search; NAS)
- 早期終了、動的計算グラフ(Early Termination, Dynamic Computation Graph)
- 蒸留 (Distillation)
- 量子化 (Quantization)

yu4u

September 27, 2019
Tweet

More Decks by yu4u

Other Decks in Technology

Transcript

  1. ⾃⼰紹介 • 内⽥祐介(株式会社ディー・エヌ・エー AIシステム部 副部⻑) • 〜2017年︓通信キャリアの研究所で画像認識・検索の研究に従事 • 2016年 ︓社会⼈学⽣として博⼠号を取得(情報理⼯学)

    • 2017年〜︓DeNA中途⼊社、深層学習を中⼼とした コンピュータビジョン技術の研究開発に従事 2 Twitter: https://twitter.com/yu4u GitHub: https://github.com/yu4u Qiita: https://qiita.com/yu4u SlideShare: https://www.slideshare.net/ren4yu medium: https://medium.com/@yu4u
  2. ⾼速化︖ • モデルパラメータ数の削減 • FLOPs (MACs) 数の削減 • モデルファイルサイズの削減 •

    推論時間の削減 • 訓練時間の削減 微妙に違うので、使うときは何を重視すべきか、 論⽂を読むときは何が改善しているのかを気にする 4
  3. FLOPs ≠ 処理速度 • Convの部分がFLOPsで⾒える部分 5 N. Ma, X. Zhang,

    H. Zheng, and J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," in Proc. of ECCV, 2018.
  4. モデル⾼速化 • 畳み込みの分解 (Factorization) • 枝刈り (Pruning) • アーキテクチャ探索 (Neural

    Architecture Search; NAS) • 早期終了、動的計算グラフ (Early Termination, Dynamic Computation Graph) • 蒸留 (Distillation) • 量⼦化 (Quantization) 6
  5. 畳み込み層の計算量 • ⼊⼒レイヤサイズ︓H x W x N • 畳み込みカーネル︓K x

    K x N x M convKxK, M と表記 (e.g. conv 3x3, 64) • 出⼒レイヤサイズ︓H x W x M • 畳み込みの計算量︓H・W・N・K2・M(バイアス項を無視) 8 W H N M K K W H ⼊⼒特徴マップ 畳み込み カーネル N 出⼒特徴マップ * 和 要素積 × M convK×K, M 畳み込み層の計算量は • 画像/特徴マップのサイズ(HW) • ⼊出⼒チャネル数(NM) • カーネルサイズ(K2) に⽐例
  6. 空間⽅向の分解 • ⼤きな畳み込みカーネルを⼩さな畳み込みカーネルに分解 • 例えば5x5の畳み込みを3x3の畳み込み2つに分解 • これらは同じサイズの受容野を持つが分解すると計算量は25:18 • Inception-v2 [4]

    では最初の7x7畳み込みを3x3畳み込み3つに分解 • 以降のSENetやShuffleNetV2等の実装でも利⽤されている[18] 9 特徴マップ conv5x5 conv3x3 - conv3x3 [4] C. Szegedy, et al., "Rethinking the Inception Architecture for Computer Vision," in Proc. of CVPR, 2016. [18] T. He, et al., "Bag of Tricks for Image Classification with Convolutional Neural Networks," in Proc. of CVPR, 2019.
  7. 空間⽅向の分解 • nxnを1xnとnx1に分解することも 10 [4] C. Szegedy, et al., "Rethinking

    the Inception Architecture for Computer Vision," in Proc. of CVPR, 2016.
  8. SqueezeNet • 戦略 • 3x3の代わりに1x1のフィルタを利⽤する • 3x3への⼊⼒となるチャネル数を少なくする(1x1で次元圧縮) 11 conv 1x1,

    s1x1 conv 1x1, e1x1 conv 3x3, e3x3 concat Fire module 32 128 128 256 256 Squeeze layer Expand layer F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size," in arXiv:1602.07360, 2016.
  9. 空間⽅向とチャネル⽅向の分解 (separable conv) • 空間⽅向とチャネル⽅向の畳み込みを独⽴に⾏う • Depthwise畳み込み(空間⽅向) • 特徴マップに対しチャネル毎に畳み込み •

    計算量︓H・W・N・K2・M (M=N) H・W・K2・N • Pointwise畳み込み(チャネル⽅向) • 1x1の畳み込み • 計算量︓H・W・N・K2・M (K=1) H・W・N・M • Depthwise + pointwise (separable) • 計算量︓H・W・N・(K2 + M) ≒ H・W・N・M (※M >> K2) • H・W・N・K2・M から⼤幅に計算量を削減 12 W H W H N 1 1 M W H W H N K K N W H W H N M K K 通常 depthwise pointwise
  10. Xception[6] • Separable convを多⽤したモデル 13 [6] F. Chollet, "Xception: Deep

    learning with depthwise separable convolutions," in Proc. of CVPR, 2017.
  11. MobileNet[7] • depthwise/pointwise convを多⽤ • 改良版のMobileNetV2[13]/V3[20]もある 14 通常の畳み込み MobileNetの1要素 [7]

    A. Howard, et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," in arXiv:1704.04861, 2017. [13] M. Sandler, et al., "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proc. of CVPR, 2018. [20] A. Howard, et al., "Searching for MobileNetV3," in Proc. of ICCV’19.
  12. MobileNetV1 vs. V2 15 depthwise conv conv 1x1 depthwise conv

    conv 1x1 conv 1x1 spatial channel ボトルネック構造を採⽤ conv1x1の計算量を相対的に削減 MobileNetV1 MobileNetV2
  13. MNasNet • 後述のアーキテクチャ探索⼿法 • Mobile inverted bottleneck にSEモジュールを追加 (MBConv) •

    MBConv3 (k5x5) →ボトルネックでチャネル数を3倍 depthwiseのカーネルが5x5 16 M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, "MnasNet: Platform-Aware Neural Architecture Search for Mobile," in Proc. of CVPR, 2019.
  14. EfficientNet • あるネットワークが与えられ、それをベースに より⼤きなネットワークを構成しようとした際の depth, width, resolutionの増加の最適割り当て • EfficientNet-B0 (ほぼMnasNet)

    で割り当てを求め、 以降は同じように 指数的に増加させる 18 M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proc. of ICML, 2019. 畳み込み層の計算量は • 画像/特徴マップのサイズ(HW) • ⼊出⼒チャネル数(NM) • カーネルサイズ(K2) に⽐例
  15. ShuffleNet[8] • MobileNetのボトルネックとなっているconv1x1を group conv1x1 + channel shuffleに置換 • group

    conv: ⼊⼒の特徴マップをG個にグループ化し 各グループ内で個別に畳み込みを⾏う (計算量 H・W・N・K2・M → H・W・N・K2・M / G) • channel shuffle: チャネルの順序を⼊れ替える reshape + transposeの操作で実現可能 c shuffle depthwise conv gconv 1x1 spatial channel gconv 1x1 [8] X. Zhang, et al., "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in arXiv:1707.01083, 2017.
  16. ShuffleNet V2 • FLOPsではなく対象プラットフォームでの実速度を⾒るべき • 効率的なネットワーク設計のための4つのガイドラインを提⾔ 1. メモリアクセス最⼩化のためconv1x1は⼊⼒と出⼒を同じにす べし 2.

    ⾏き過ぎたgroup convはメモリアクセスコストを増加させる 3. モジュールを細分化しすぎると並列度を低下させる 4. 要素毎の演算(ReLUとかaddとか)コストは無視できない • これらの妥当性がtoyネットワークを通して実験的に⽰されている 20 N. Ma, X. Zhang, H. Zheng, and J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," in Proc. of ECCV, 2018.
  17. ShuffleNet V2 • その上で新たなアーキテクチャを提案 21 N. Ma, X. Zhang, H.

    Zheng, and J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," in Proc. of ECCV, 2018.
  18. ChannelNet[11] • チャネル⽅向に1次元の畳み込みを⾏う 22 [11] H. Gao, Z. Wang, and

    S. Ji, "ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions", in Proc. of NIPS, 2018.
  19. 他にも… 25 G. Huang, S. Liu, L. Maaten, and K.

    Weinberger, "CondenseNet: An Efficient DenseNet using Learned Group Convolutions," in Proc. of CVPR, 2018. T. Zhang, G. Qi, B. Xiao, and J. Wang. Interleaved group convolutions for deep neural networks," in Proc. of ICCV, 2017. G. Xie, J. Wang, T. Zhang, J. Lai, R. Hong, and G. Qi, "IGCV2: Interleaved Structured Sparse Convolutional Neural Networks, in Proc. of CVPR, 2018. K. Sun, M. Li, D. Liu, and J. Wang, "IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks," in BMVC, 2018. J. Zhang, "Seesaw-Net: Convolution Neural Network With Uneven Group Convolution," in arXiv:1905.03672, 2019.
  20. Unstructured vs. Structured Pruning • Pruning前の畳み込みフィルタ • Unstructured pruning •

    Structured pruning(フィルタ(チャネル)pruningが⼀般的) 29 K K … … … M(出⼒チャネル)個 計算量vs.精度のtrade-offは優れているが 専⽤のハードウェアでないと⾼速化できない 単にチャネル数が減少したネットワークに 再構築が可能で⾼速化の恩恵を受けやすい
  21. Deep Compression[23, 25, 26] • Unstructuredなpruning • L2 正則化を加えて学習し、絶対値が⼩さいweightを0に •

    実際に⾼速に動かすには専⽤ハードが必要[26] 33 [23] S. Han, et al., "Learning both Weights and Connections for Efficient Neural Networks," in Proc. of NIPS, 2015. [25] S. Han, et al., "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," in Proc. of ICLR, 2016. [26] S. Han, et al., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," in Proc. of ISCA, 2016.
  22. Pruning Filters for Efficient ConvNets[30] • Structured pruning(チャネルレベルのpruning) • 各レイヤについて、フィルタの重みの絶対値の総和が

    ⼩さいものからpruning • 各レイヤのpruning率はpruningへのsensitivityから ⼈⼿で調整 • Pruning後にfinetune 34 [30] H. Li, et al., "Pruning Filters for Efficient ConvNets," in Proc. of ICLR, 2017.
  23. Network Slimming[33] • Batch normのパラメータγにL1 ロスをかけて学習 • 学習後、γが⼩さいチャネルを削除し、fine-tune 35 チャネル毎に⼊⼒を平均0分散1に正規化、γとβでscale

    & shi. チャネルi … … Batch normaliza-on [33] Z. Liu, et al., "Learning Efficient Convolutional Networks through Network Slimming," in Proc. of ICCV, 2017.
  24. L0 ではなくLasso に緩和して解く Channel Pruning[34] • あるfeature mapのチャネル削除した場合に 次のfeature mapの誤差が最⼩となるようチャネルを選択

    • Wも最⼩⼆乗で調整 36 [34] Y. He, et al., "Channel Pruning for Accelerating Very Deep Neural Networks," in Proc. of ICCV, 2017.
  25. AutoML for Model Compression and Acceleration (AMC)[41] • 強化学習(off-policy actor-critic)により各レイヤ毎の最適な

    pruning率を学習(実際のpruningは他の⼿法を利⽤) • ⼊⼒は対象レイヤの情報とそれまでのpruning結果、 報酬は –エラー率×log(FLOPs) or log(#Params) 38 [41] Y. He, et al., "AMC - AutoML for Model Compression and Acceleration on Mobile Devices," in Proc. of ECCV, 2018.
  26. NetAdapt • ステップ毎に定義されるリソース制約を満たす 最適なlayerをgreedyにpruning • LUTを利⽤してリソースを推定 • ステップ毎に少しだけfinetune • 最終的⽬的のリソースまで

    削減できたら⻑めに finetuneして終了 39 T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, "NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications," in Proc. of ECCV, 2018.
  27. Lottery Ticket Hypothesis (ICLRʼ19 Best Paper)[44] • NNには、「部分ネットワーク構造」と「初期値」の 組み合わせに「当たり」が存在し、それを引き当てると 効率的に学習が可能という仮説

    • Unstructuredなpruningでその構造と初期値を⾒つけることができた 40 https://www.slideshare.net/YosukeShinya/the-lottery-ticket-hypothesis-finding-small-trainable-neural-networks [44] Jonathan Frankle, Michael Carbin, "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks," in Proc. of ICLR, 2019.
  28. Network Pruning as Architecture Search[45] • Structuredなpruning後のネットワークをscratchから学習させても finetuneと同等かそれより良い結果が得られるという主張 • つまりpruningは、重要な重みを探索しているのではなく

    各レイヤにどの程度のチャネル数を割り当てるかという Neural Architecture Search (NAS) をしているとみなせる • Lottery Ticket Hypothesisではunstructuredで、低LRのみ、 実験も⼩規模ネットワークのみ 41 [45] Z. Liu, et al., "Rethinking the Value of Network Pruning," in Proc. of ICLR, 2019.
  29. Slimmable Neural Networks* • 1モデルだが複数の計算量(精度)で動かせるモデルを学習 • Incremental trainingだと精度が出ない • 同時学習だとBNの統計量が違うため学習できない

    → 切替可能なモデルごとにBN層だけを個別に持つ︕ • もっと連続的に変化できるモデル**や、そこからgreedyにpruning する(精度低下が最も⼩さいレイヤを削っていく)拡張***も 42 * J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, "Slimmable Neural Networks," in Proc. of ICLR, 2019. ** J. Yu and T. Huang, "Universally Slimmable Networks and Improved Training Techniques," in arXiv:1903.05134, 2019. *** J. Yu and T. Huang, "Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers," in arXiv:1903.11728, 2019.
  30. MetaPruning • Pruning後のネットワークの重みを 出⼒するPruningNetを学習 • Blockへの⼊⼒はNetwork encoding vector 前および対象レイヤのランダムなpruning率 •

    全部⼊れたほうが良さそうな気がするが 著者に聞いたところ効果なし • End-to-endで学習できる︕ • 学習が終わると精度vs.速度のトレードオフの 優れたモデルを探索(⼿法は何でも良い)ここではGA 43 Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, T. Cheng, and J. Sun, "MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning," in Proc. of ICCV’19.
  31. アーキテクチャ探索 (NAS) • NNのアーキテクチャを⾃動設計する⼿法 • 探索空間、探索⼿法、精度評価⼿法で⼤まかに分類される • 探索空間 • Global,

    cell-based • 探索⼿法 • 強化学習、進化的アルゴリズム、gradientベース、random • 精度測定⼿法 • 全学習、部分学習、weight-share、枝刈り探索 45 T. Elsken, J. Metzen, and F. Hutter, "Neural Architecture Search: A Survey," in JMLR, 2019. M. Wistuba, A. Rawat, and T. Pedapati, "A Survey on Neural Architecture Search," in arXiv:1905.01392, 2019. https://github.com/D-X-Y/awesome-NAS
  32. NAS with Reinforcement Learning • 探索空間︓global、探索⼿法︓REINFORCE • RNNのcontrollerがネットワーク構造を⽣成 • 畳み込み層のパラメータと、skip

    connectionの有無を出⼒ • ⽣成されたネットワークを学習し、その精度を報酬にする 46 Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," in Proc. of ICLR, 2017.
  33. NAS with Reinforcement Learning • 800GPUs for 28 daysの成果 47

    Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," in Proc. of ICLR, 2017.
  34. NASNet[52] • 探索空間︓cell、 探索⼿法︓強化学習 (Proximal Policy Optimization) • Globalな設計にドメイン知識を活⽤、 構成するcellのみを⾃動設計

    →探索空間を⼤幅に削減 • Normal cell x Nとreduction cellのスタック • Reduction cellは最初にstride付きのOPで 特徴マップをダウンサンプル • Reduction cell以降でチャネルを倍に 48 [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018.
  35. NASNetのコントローラの動作 1. Hidden state※1 1, 2を選択 2. それらへのOPsを選択※2 3. それらを結合するOP

    (add or concat) を選択し新たなhidden stateとする ※1 Hidden state: 緑のブロックとhi , hi-I ※2 Hidden stateへのOP候補 49 [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018.
  36. NASNetのコントローラの動作 1. Hidden state※1 1, 2を選択 2. それらへのOPsを選択※2 3. それらを結合するOP

    (add or concat) を選択し新たなhidden stateとする ※1 Hidden state: 緑のブロックとhi , hi-I ※2 Hidden stateへのOP候補 50 [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018.
  37. NASNetのコントローラの動作 1. Hidden state※1 1, 2を選択 2. それらへのOPsを選択※2 3. それらを結合するOP

    (add or concat) を選択し新たなhidden stateとする ※1 Hidden state: 緑のブロックとhi , hi-I ※2 Hidden stateへのOP候補 51 sep 3x3 avg 3x3 [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018.
  38. NASNetのコントローラの動作 1. Hidden state※1 1, 2を選択 2. それらへのOPsを選択※2 3. それらを結合するOP

    (add or concat) を選択し新たなhidden stateとする ※1 Hidden state: 緑のブロックとhi , hi-I ※2 Hidden stateへのOP候補 52 concat sep 3x3 avg 3x3 [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018.
  39. ENAS[54] • 探索空間︓cell、探索⼿法︓強化学習 (REINFORCE) • Cellの構造を出⼒するRNNコントローラと、 コントローラーが出⼒する全てのネットワークをサブグラフとして 保持できる巨⼤な計算グラフ(ネットワーク)を同時に学習 →⽣成したネットワークの学習が不要に(1GPU for

    0.45 days!) • Single shot, weight share • 詳細は神資料*を参照 53 [54] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and Jeff Dean, "Efficient Neural Architecture Search via Parameter Sharing," in Proc. of ICML, 2018. * https://www.slideshare.net/tkatojp/efficient-neural-architecture-search-via-parameters- sharing-icml2018
  40. FBNet[61] • DARTSと同じくgradient-based • 各OPの実デバイス上での処理時間をlookup tableに保持 • 処理時間を考慮したロスをかける • ブロック毎に違う構造

    56 [61] B. Wu, et al., "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search", in Proc. of CVPR, 2019. クロスエントロピー 処理時間
  41. Random Search系 • Weight share + random search (ASHA) が良い*

    • Asynchronous Successive Halving (ASHA)︓複数のモデルを平⾏ に学習を進めながら有望なものだけを残して枝刈り • Optunaで使えるよ︕** • 探索空間を、ランダムなDAG⽣成アルゴリズムが⽣成するグラフ にすると想像以上に良い*** 57 * L. Li and A. Talwalkar, "Random search and reproducibility for neural architecture search," in arXiv:1902.07638, 2019. ** https://www.slideshare.net/shotarosano5/automl-in-neurips-2018 *** S. Xie, A. Kirillov, R. Girshick, and K. He, "Exploring Randomly Wired Neural Networks for Image Recognition," in arXiv:1904.01569, 2019.
  42. 他にも [58] H. Cai, L. Zhu, and S. Han, "ProxylessNAS:

    Direct Neural Architecture Search on Target Task and Hardware," in Proc. of ICLR, 2019. [59] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, "MnasNet: Platform-Aware Neural Architecture Search for Mobile," in Proc. of CVPR, 2019. [60] X. Dai, et al., "ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation," in Proc. of CVPR, 2019. [62] D. Stamoulis, et al., "Single-Path NAS: Device-Aware Efficient ConvNet Design," in Proc. of ICMLW, 2019. 58
  43. Spatially Adaptive Computation Time (SACT)[66] • ACT: 各ResBlockがhalting scoreを出⼒、合計が1を超えると 以降の処理をスキップ(空間領域でも⾏うとSACT)

    62 計算量に関する勾配を追加 [66] M. Figurnov, et al., "Spatially Adaptive Computation Time for Residual Networks," in Proc. of CVPR, 2017.
  44. Distilling the Knowledge in a Neural Network[77] 67 … …

    学習画像 学習済みモデル 学習するモデル … 正解ラベル (ハード ターゲッ ト) 通常T = 1のsoftmaxのTを⼤きくした ソフトターゲットを利⽤ … ソフトターゲット ソフト ターゲット ハード ターゲット 正解ラベルと 学習モデル出⼒の 両⽅を利⽤ [77] G. Hinton, et al., "Distilling the Knowledge in a Neural Network," in Proc. of NIPS Workshop, 2014.
  45. さいきんの(雑) 69 B. Heo, et al., "A Comprehensive Overhaul of

    Feature Distillation," in Proc. of ICCV, 2019.
  46. 量⼦化 • ネットワークのパラメータ等を量⼦化することで モデルサイズを削減、学習や推論を⾼速化 • 量⼦化対象 • 重み、アクティベーション(特徴マップ)、勾配、エラー • 量⼦化⼿法

    • 線形、log、⾮線形 / スカラ、ベクトル、直積量⼦化 • 量⼦化ビット • 1bit(バイナリ)、3値 (-1, 0, 1)、8bit、16bit、任意bit • 専⽤ハードがないと恩恵を受けられない事が多い • 半精度/混合精度*は汎⽤ハード&フレームワークでもサポート 71 * https://github.com/NVIDIA/apex
  47. WAGE[96] • weights (W), activations (A), gradients (G), errors (E)

    の全てを量⼦化 72 [96] S. Wu, et al., "Training and Inference with Integers in Deep Neural Networks," in Proc. of ICLR, 2018.
  48. WAGE[96] • weights (W), activations (A), gradients (G), errors (E)

    73 バイナリ [96] S. Wu, et al., "Training and Inference with Integers in Deep Neural Networks," in Proc. of ICLR, 2018.
  49. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference[97]

    • 推論時にuint8の演算がメインとなるように 学習時に量⼦化をシミュレーションしながら学習 • TensorFlow公式に実装が存在* 74 [97] B. Jacob, et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in Proc. of CVPR, 2018. * https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/quantize/README.md
  50. 汎⽤的な⾼速化⼿法を紹介 • 畳み込みの分解 (Factorization) • 枝刈り (Pruning) • アーキテクチャ探索 (Neural

    Architecture Search; NAS) • 早期終了、動的計算グラフ (Early Termination, Dynamic Computation Graph) • 蒸留 (Distillation) • 量⼦化 (Quantization) 78
  51. まとめ • NASが庶⺠の⼿に • Single shot, weight share • FLOPsではなく実速度を最適化(mobile

    device-aware • 依然としてベースモジュール (cell) は⼈⼿ • むしろ昔はcellのほうが⾃動設計(配線多いのがNG) • あまり探索された感がない(greedyなgrid search感) • モジュール設計・pruning・NASが⼀体化 • 今後 • 単に軽量なバックボーンを利⽤するだけでなく 各タスクに最適化されたアーキテクチャ(既にあるけど) 80
  52. 畳み込みの分解 [1] L. Sifre and S. Mallat, "Rotation, Scaling and

    Deformation Invariant Scattering for Texture Discrimination," in Proc. of CVPR, 2013. [2] L. Sifre, "Rigid-motion Scattering for Image Classification, in Ph.D. thesis, 2014. [3] M. Lin, Q. Chen, and S. Yan, "Network in Network," in Proc. of ICLR, 2014. [4] C. Szegedy, et al., "Rethinking the Inception Architecture for Computer Vision," in Proc. of CVPR, 2016. [5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size," in arXiv:1602.07360, 2016. [6] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proc. of CVPR, 2017. [7] A. Howard, et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," in arXiv:1704.04861, 2017. [8] X. Zhang, et al., "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in arXiv:1707.01083, 2017. [9] B. Wu, et al., "Shift: A Zero FLOP, Zero Parameter," in arXiv:1711.08141, 2017. [10] N. Ma, X. Zhang, H. Zheng, and J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," in Proc. of ECCV, 2018. [11] H. Gao, Z. Wang, and S. Ji, "ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions", in Proc. of NIPS, 2018. [12] G. Huang, S. Liu, L. Maaten, and K. Weinberger, "CondenseNet: An Efficient DenseNet using Learned Group Convolutions," in Proc. of CVPR, 2018. [13] M. Sandler, et al., "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proc. of CVPR, 2018. [14] G. Xie, J. Wang, T. Zhang, J. Lai, R. Hong, and G. Qi, "IGCV2: Interleaved Structured Sparse Convolutional Neural Networks, in Proc. of CVPR, 2018. 82
  53. 畳み込みの分解 [15] T. Zhang, G. Qi, B. Xiao, and J.

    Wang, "Interleaved group convolutions for deep neural networks," in Proc. of ICCV, 2017. [16] Z. Qin, Z. Zhang, X. Chen, and Y. Peng, "FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy," in Proc. of ICIP, 2018. [17] K. Sun, M. Li, D. Liu, and J. Wang, "IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks," in BMVC, 2018. [18] T. He, et al., "Bag of Tricks for Image Classification with Convolutional Neural Networks," in Proc. of CVPR, 2019. [19] Y. Chen, et al., "Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution," in arXiv:1904.05049, 2019. [20] A. Howard, et al., "Searching for MobileNetV3," in arXiv:1905.02244, 2019. [21] J. Zhang, "Seesaw-Net: Convolution Neural Network With Uneven Group Convolution," in arXiv:1905.03672, 2019. 83
  54. 枝刈り [22] Y. LeCun, J. Denker, and S. Solla, "Optimal

    Brain Damage," in Proc. of NIPS, 1990. [23] S. Han, J. Pool, J. Tran, and W. Dally, "Learning both Weights and Connections for Efficient Neural Networks," in Proc. of NIPS, 2015. [24] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, "Learning Structured Sparsity in Deep Neural Networks," in Proc. of NIPS, 2016. [25] S. Han, et al., "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," in Proc. of ICLR, 2016. [26] S. Han, J. Pool, J. Tran, and W. Dally, "EIE: Efficient Inference Engine on Compressed Deep Neural Network," in Proc. of ISCA, 2016. [27] S. Anwar, K. Hwang, and W. Sung, "Structured Pruning of Deep Convolutional Neural Networks," in JETC, 2017. [28] S. Changpinyo, M. Sandler, and A. Zhmoginov, "The Power of Sparsity in Convolutional Neural Networks," in arXiv:1702.06257, 2017. [29] S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, "Group Sparse Regularization for Deep Neural Networks," in Neurocomputing, 2017. [30] H. Li, et al., "Pruning Filters for Efficient ConvNets," in Proc. of ICLR, 2017. [31] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, "Pruning Convolutional Neural Networks for Resource Efficient Inference," in Proc. of ICLR, 1017. [32] D. Molchanov, A. Ashukha, and D. Vetrov, "Variational Dropout Sparsifies Deep Neural Networks," in Proc. of ICML, 2017. [33] Z. Liu, et al., "Learning Efficient Convolutional Networks through Network Slimming," in Proc. of ICCV, 2017. [34] Y. He, et al., "Channel Pruning for Accelerating Very Deep Neural Networks," in Proc. of ICCV, 2017. [35] J. Luo, et al., "ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression," in Proc. of ICCV, 2017. [36] C. Louizos, K. Ullrich, and M. Welling, "Bayesian Compression for Deep Learning," in Proc. of NIPS, 2017. 84
  55. 枝刈り [37] Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov,

    "Structured Bayesian Pruning via Log-Normal Multiplicative Noise," in Proc. of NIPS, 2017. [38] M. Zhu and S. Gupta, "To prune, or not to prune: exploring the efficacy of pruning for model compression," in Proc. of ICLRW, 2018. [39] T. Yang, Y. Chen, and V. Sze, "Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning," in Proc. of CVPR, 2017. [40] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, "Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks," in Proc. of IJCAI, 2018. [41] Y. He, et al., "AMC - AutoML for Model Compression and Acceleration on Mobile Devices," in Proc. of ECCV, 2018. [42] T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, "NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications," in Proc. of ECCV, 2018. [43] J. Luo and J. Wu, "AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference," in arXiv:1805.08941, 2018. [44] J. Frankle and M. Carbin, "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks," in Proc. of ICLR, 2019. [45] Z. Liu, et al., "Rethinking the Value of Network Pruning," in Proc. of ICLR, 2019. [46] J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, "Slimmable Neural Networks," in Proc. of ICLR, 2019. [47] S. Lin, R. Ji, C. Yan, B. Zhang, L. Cao, Q. Ye, F. Huang, and D. Doermann, "Towards Optimal Structured CNN Pruning via Generative Adversarial Learning," in Proc. of CVPR, 2019. GAN [48] J. Yu and T. Huang, "Universally Slimmable Networks and Improved Training Techniques," in arXiv:1903.05134, 2019. [49] J. Yu and T. Huang, "Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers," in arXiv:1903.11728, 2019. [50] Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, T. Cheng, and J. Sun, "MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning," in arXiv:1903.10258, 2019. 85
  56. アーキテクチャ探索 [51] B. Zoph and Q. V. Le, "Neural architecture

    search with reinforcement learning," in Proc. of ICLR, 2017. [52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proc. of CVPR, 2018. [53] C. Liu, et al., "Progressive Neural Architecture Search," in Proc. of ECCV, 2018. [54] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and Jeff Dean, "Efficient Neural Architecture Search via Parameter Sharing," in Proc. of ICML, 2018. [55] H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu, "Hierarchical Representations for Efficient Architecture Search," in Proc. of ICLR, 2018. [56] E. Real, A. Aggarwal, Y. Huang, Q. V. Le, "Regularized Evolution for Image Classifier Architecture Search," in Proc. of AAAI, 2019. [57] H. Liu, K. Simonyan, and Y. Yang, "DARTS: Differentiable Architecture Search," in Proc. of ICLR, 2019. [58] H. Cai, L. Zhu, and S. Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware," in Proc. of ICLR, 2019. [59] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, "MnasNet: Platform-Aware Neural Architecture Search for Mobile," in Proc. of CVPR, 2019. [60] X. Dai, et al., "ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation," in Proc. of CVPR, 2019. [61] B. Wu, et al., "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search", in Proc. of CVPR, 2019. [62] D. Stamoulis, et al., "Single-Path NAS: Device-Aware Efficient ConvNet Design," in Proc. of ICMLW, 2019. [63] L. Li and A. Talwalkar, "Random search and reproducibility for neural architecture search," in arXiv:1902.07638, 2019. 86
  57. 早期終了、動的計算グラフ [64] Y. Guo, A. Yao, and Y. Chen, "Dynamic

    Network Surgery for Efficient DNNs," in Proc. of NIPS, 2016. [65] S. Teerapittayanon, et al., "BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks," in Proc. of ICPR, 2016. [66] M. Figurnov, et al., "Spatially Adaptive Computation Time for Residual Networks," in Proc. of CVPR, 2017. [67] T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, "Adaptive Neural Networks for Efficient Inference," in Proc. of ICML, 2017. [68] J. Lin, et al., "Runtime Neural Pruning," in Proc. of NIPS, 2017. [69] G. Huang, D. Chen, T. Li, F. Wu, L. Maaten, and K. Weinberger, "Multi-Scale Dense Networks for Resource Efficient Image Classification," in Proc. of ICLR, 2018. [70] X. Wang, F. Yu, Z. Dou, T. Darrell, and J. Gonzalez, "SkipNet: Learning Dynamic Routing in Convolutional Networks," in Proc. of ECCV, 2018. [71] A. Veit and S. Belongie, "Convolutional Networks with Adaptive Inference Graphs," in Proc. of ECCV, 2018. [72] L. Liu and J. Deng, "Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-Offs by Selective Execution," in Proc. of AAAI, 2018. [73] Z. Wu, et al., "BlockDrop: Dynamic Inference Paths in Residual Networks," in Proc. of CVPR, 2018. [74] R, Yu, et al., "NISP: Pruning Networks using Neuron Importance Score Propagation," in Proc. of CVPR, 2018. [75] J. Kuen, X. Kong, Z. Lin, G. Wang, J. Yin, S. See, and Y. Tan, "Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks," in Proc. of CVPR, 2018. [76] X. Gao, Y. Zhao, L. Dudziak, R. Mullins, and C. Xu, "Dynamic Channel Pruning: Feature Boosting and Suppression," in Proc. of ICLR, 2019. 87
  58. 蒸留 [77] G. Hinton, et al., "Distilling the Knowledge in

    a Neural Network," in Proc. of NIPS Workshop, 2014. [78] J. Ba and R. Caruana, "Do Deep Nets Really Need to be Deep?," in Proc. of NIPS, 2014. [79] A. Romero, et al., "FitNets: Hints for Thin Deep Nets," in Proc. of ICLR, 2015. [80] T. Chen, I. Goodfellow, and J. Shlens, "Net2Net: Accelerating Learning via Knowledge Transfer," in Proc. of ICLR, 2016. [81] G. Urban, et al., "Do Deep Convolutional Nets Really Need to be Deep and Convolutional?," in Proc. of ICLR, 2017. [82] J. Yim, D. Joo, J. Bae, and J. Kim, "A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning," in Proc. of CVPR, 2017. [83] A. Mishra and D. Marr, "Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy," in Proc. of ICLR, 2018. [84] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, "Born Again Neural Networks," in Proc. of ICML, 2018. [85] Y. Zhang, T. Xiang, T. Hospedales, and H. Lu, "Deep Mutual Learning," in Proc. of CVPR, 2018. [86] X. Lan, X. Zhu, and S. Gong, "Knowledge Distillation by On-the-Fly Native Ensemble," in Proc. of NIPS, 2018. [87] W. Park, D. Kim, Y. Lu, and M. Cho, "Relational Knowledge Distillation," in Proc. of CVPR, 2019. 88
  59. 量⼦化 [88] M. Courbariaux, Y. Bengio, and J. David, "BinaryConnect:

    Training Deep Neural Networks with binary weights during propagations," in Proc. of NIPS, 2015. [89] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized Neural Networks," in Proc. of NIPS, 2016. [90] M. Rastegari, V. OrdonezJoseph, and R. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," in Proc. of ECCV, 2016. [91] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized Convolutional Neural Networks for Mobile Devices," in Proc. of CVPR, 2016. [92] F. Li, B. Zhang, and B. Liu, "Ternary Weight Networks," in arXiv:1605.04711, 2016. [93] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, "DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," in arXiv:1606.06160, 2016. [94] C. Zhu, S. Han, H. Mao, and W. Dally, "Trained Ternary Quantization," in Proc. of ICLR, 2017. [95] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights," in Proc. of ICLR, 2017. [96] S. Wu, G. Li, F. Chen, and L. Shi, "Training and Inference with Integers in Deep Neural Networks," in Proc. of ICLR, 2018. [97] B. Jacob, et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in Proc. of CVPR, 2018. [98] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K. Cheng, "Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm," in Proc. of ECCV, 2018. [99] N. Wang, J. Choi, D. Brand, C. Chen, and K. Gopalakrishnan, "Training Deep Neural Networks with 8-bit Floating Point Numbers," in Proc. of NIPS, 2018. [100] G. Yang, et al., "SWALP : Stochastic Weight Averaging in Low-Precision Training," in Proc. of ICML, 2019. 89
  60. 90