CompML: Introduction to Neural Network Pruning

CompML Introduction to Neural Network Pruning Ryohei Izawa

CompML TL; DR • Pruning（剪定，枝刈り）とは，ネットワークの重みの⼀部を０にすることで，パラメータ数や計算量を削減する⼿法． • 多くのPruning⼿法は Han et
al., 2015[1]の⼿法（学習 ⇨ 剪定 ⇨ 再学習）に由来しており，各⼿法の主な違いは，Pruningの単位， Pruningを適⽤するルール，Pruningスケジュール，再学習⽅法にある[2]． • 既存⼿法は，異なるモデルやデータセットで実験をしており，⼿法の⽐較が難しい． 1 [1] Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pp. 1135– 1143, 2015. [2] Blalock, D., Ortiz, J.J.G., Frankle, J., Guttag, J.: What is the state of neural network pruning? arXiv preprint arXiv:2003.03033 (2020)

CompML Pruningの概要

CompML Pruningとは Pruning（剪定，枝刈り）とは，ネットワークの重みの⼀部を０にすることで，パラメータ数や計算量を削減する⼿法． 3 Han et al., 2015[1]

CompML Pruningの研究の盛り上がり Pruningに関する論⽂数は年々増加． 4 Mirkes, E. M[3] Number of published
papers with keywords “neural” AND “network” AND “pruning” (dotted line in the top graph), with keywords “neural” AND “network” (solid line in the top graph) and ratio. [3] Mirkes, E. M. (2020, July). Artificial Neural Network Pruning to Extract Knowledge. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

CompML Pruningの⼿法多くのNeural NetworkのPruning⼿法は，Han et al., 2015[1] の⼿法に由来している． 5 Han
et al., 2015 [1] 1. ネットワークを収束するように学習 2. ネットワーク内のパラメータ(重み)または構造要素のスコアに基づきネットワークをPruning 3. Pruning後のネットワークで再学習

CompML Pruningに期待される効果元のネットワークと同等の精度を維持または向上させた上で，以下を達成したい． 6 • メモリ使⽤量の削減 • 計算量の削減 • 計算エネルギー削減
しかし，⼀般に計算効率化達成と精度の達成はトレードオフの関係にある． ⇨ このトレードオフを改善するための様々な⼿法が研究されている．

CompML Pruning⼿法の主な違い

CompML Pruningの⼿法の違いポイント多くのPruning⼿法が提案されているが，主な違いのポイントは以下の４点．(Blalock et al,. 2020)[2] 8 • Structure ：どの単位でPruningするか
• Scoring ：どのようなルールでPruningするか • Scheduling：どのタイミングでPruningするか • Fine-Tuning：どのように再学習するか（または再学習しない）

CompML Structure：Pruningの単位層・フィルタ単位でのPruningをStructured Pruning，重み単位のPruningをUnstructured Pruningと呼ぶ． Pruningの単位により，演算効率性と精度はトレードオフの関係にある． 9 粗 / 構造
細 / ⾮構造⾼い低い低い⾼い演算効率性精度 Pruningの単位層フィルタカーネル重み

CompML Scoring：Pruningのルールパラメータの絶対値，重要度係数などに基づいてPruningする⼿法が多い． 10 • スコアを局所的に⽐較し，ネットワークのサブコンポーネント（レイヤー，フィルタ）内で最もスコアが低いパラメータの⼀部をPruning （Han et al.,
2015[1]） • スコアをグローバルに⽐較し，スコアが低いパラメータをPruning （Lee et al., 2019[4]; Frankle et al., 2019[5]） [4] Lee, N., Ajanthan, T., and Torr, P. H. S. Snip: singleshot network pruning based on connection sensitivity. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6- 9, 2019. [5] Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.

CompML Scheduling：Pruningのタイミング⼿法によって，1回でPruningするネットワークの量が異なる． 11 • 1つのステップで⼀度にすべての望ましい重みをPruning． • 反復的に同じ数だけPruning． • 複雑な機能に応じてPruningの速度を変化させる．

CompML Fine-Tuning：再学習⽅法 Pruning前の学習済みの重みを使⽤して，ネットワークを再学習することが多い． 12 • ネットワークを学習前の値で再初期化（Frankle et al,. 2019[5]） •
ネットワークをランダムな値で再初期化（Liu et al,. 2019[6]）その他の⽅法 [6] Liu, Z., Sun, M., Zhou, T., Huang, G., and Darrell, T. Rethinking the value of network pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.

CompML Pruning⼿法の紹介

CompML 紹介する論⽂ • Deep Compression: Compressing Deep Neural Networks with
Pruning, Trained Quantization and Huffman Coding • Pruning Filters for Efficient ConvNets • Channel Pruning for Accelerating Very Deep Neural Networks • Neuron Merging: Compensating for Pruned Neurons • Rethinking the Value of Network Pruning • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks • SNIP: Single-shot Network Pruning based on Connection Sensitivity • Learning Efficient Convolutional Networks through Network Slimming • To prune, or not to prune: exploring the efficacy of pruning for model compression 14 各論⽂の概要まとめ https://github.com/CompML/survey-neural-network-pruning/issues

CompML Deep compression: Compressing deep neural network with pruning, trained
quantization and huffman coding (Han et al., 2015[7]) 学習済みのネットワークを作成し，重み単位でPruningして再学習．さらに量⼦化を適⽤したのち，再学習して，最後にハフマンコーディングを適⽤する．元のモデルと同等の精度を保ったまま，必要なストレージ容量を35倍から45倍削減. 15 [7] Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.

CompML Pruning filters for efficient convnets (Li et al., 2016[8])
フィルタ単位のPruning⼿法．重み単位のPruningと対照的に，スパース畳み込みライブラリのサポートを必要とせず，既存のライブラリで動作．推論コストを30%削減． 16 [8] Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016. 1. 各レイヤのフィルタごとに重みの絶対値を合計し，重みの絶対値の合計が⼩さいフィルタからPruning． 2. 各層のフィルタのPruning数を， Pruningの感度に応じて決定． 3. 剪定=>再学習は１サイクル．複数レイヤーのフィルタを⼀度にPruningして，元の精度が回復するまで再学習．⼿法

CompML Channel pruning for accelerating very deep neural networks (He
et al., 2017[9]) LASSO回帰に基づくチャネル選択を⾏い，特徴マップを最⼩⼆乗で再構成する，推論時のPruning⼿法を提案．ResNetなどのマルチブランチネットワークにも対応可能． 0.3%-1.4%の精度劣化で2-5倍の推論速度向上を達成． 17 [9] He, Y., Zhang, X., and Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397, 2017.

CompML Neuron Merging: Compensating for Pruned Neurons (Kim et al.,
2020[10]) フィルタレベルのPruningによる精度劣化を防ぐために，Pruningされたフィルタの情報を別のフィルタに結合することで精度損失を防ぐニューロン結合⼿法を提案． Pruningな剪定⼿法よりも元のモデルの情報を保持． 18 [10] Kim, W., Kim, S., Park, M., & Jeon, G. (2020). Neuron Merging: Compensating for Pruned Neurons. Advances in Neural Information Processing Systems, 33.

CompML Re-thinking the value of network pruning (Kim et al.,
2019[11]) 重みがランダムに初期化されたPruningモデルをスクラッチから学習した⽅が，FineTuningされた Pruningモデルと⽐較して精度が同等もしくは⾼いことを，複数のモデルとの⽐較を通じて⽰した．構造探索としてのPruningの可能性の検証も実施． 19 [11] Liu, Z., Sun, M., Zhou, T., Huang, G., and Darrell, T. Rethinking the value of network pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.

CompML The lottery ticket hypothesis: Finding sparse, trainable neural networks
(Frankle et al., 2019[12]) 宝くじ仮説（The Lottery Ticket Hypothesis）※を提唱．学習の過程で「Pruning=>残った重み初期化」を（繰り返し）⾏い，得られたサブネットワークが元のネットワークの精度と同等もしくは上回るか確認．元のネットワークの10-20%以下のサイズで，元のネットワークよりも⾼速に学習し，⾼い精度を達成する部分ネットワークを⾒つけることができた． 20 [12] Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. ※ ランダムに初期化された密なネットワークは，⾼々同じイテレーション数の学習を⾏えば元のネットワークと同様の精度を達成できるようなサブネットワーク（Winning Tickets）を持つ．

CompML Snip: single-shot network pruning based on connection sensitivity (Lee
et al., 2019[13]) 学習前に，データに依存した⽅法で，与えられたタスクにとって重要なネットワーク内の接続を識別するための重要度基準を導⼊．その重要度に基づいてPruning後，標準的な学習を⾏うもの．これにより事前学習と複雑な剪定スケジュールの両⽅が不要となる．実験の結果，ベースラインと同じ精度で⾼いスパース性を持つネットワークを得ることができた． 21 [13] Lee, N., Ajanthan, T., and Torr, P. H. S. Snip: singleshot network pruning based on connection sensitivity. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6- 9, 2019. • 分散スケーリング初期化における損失関数への影響に基づいて重要な接続を発⾒． • 所望の疎度が与えられると，学習前に冗⻑な接続を⼀度だけPruning(シングルショット)．その後，Pruningされたネットワークを学習． • ネットワークの重みの初期化には，variance scaling⼿法を使うことを提唱．

CompML Learning Efficient Convolutional Networks through Network Slimming (Liu et
al., 2017[14]) チャンネルレベルの⾃動Pruning⼿法．Batchnormのスケーリング因⼦γにL1ロスを課して学習し，γ が⼩さいチャネルを削除したのち，再学習． VGGNetでは，モデルサイズを20倍，計算量を5倍削減． 22 [14] Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744, 2017. 損失関数（g(・)はL_1正則化項） Batch normalization（γ：scale，β：shift）

CompML To prune, or not to prune: exploring the efficacy
of pruning for model compression (Zhu et al., 2018[15]) ⼤規模でPruningされたネットワークと，⼩規模で密なモデルを⽐較．その結果，⼤規模なスパースネットワークモデルの⽅が，⼩規模で密なモデルよりも精度が⾼かった．⼤規模なスパースネットワークモデルは，精度の損失を最⼩限に抑えながら⾮ゼロパラメータ数を最⼤10倍まで削減．⾃動段階的Pruning⼿法も提案． 23 [15] Zhu, M. H., & Gupta, S. (2018). To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. ⾃動段階的Pruningアルゴリズム 1. 各層の重みを絶対値でソートし，スパースレベル s_f まで最⼩のマグニチュードの重みをゼロマスク． 2. 学習ステップ t_0 から開始し，剪定頻度 ∆t で n 段階の剪定ステップを経て，初期のスパースレベル s_i から最終的なスパースレベル s_f までスパースレベルを増加させる

CompML Pruningの⼿法の評価

CompML Pruningの評価結局どの⼿法が良いのか？ ⇨ 多くのPruning⼿法が提案されているが，評価⽅法がバラバラで横⽐較ができない．（Blalock et al,. 2020 [2]
） 25 • 実験で使⽤しているネットワークアーキテクチャ，データセットが異なる． • 細かいハイパーパラメータが異なる． • FLOPS，Pruning率など，モデル効率かの評価指標が異なり，また，計算⽅法も異なる． • 結果表⽰⽅法がバラバラ（実験したPruning率が異なる）

CompML Pruningのベンチマークベンチマークが複数提案されている． 26 • What is the state of
neural network pruning? (Blalock et al,. 2020[2]) • The State of Sparcity in Deep Neural Network (Gale et al,. 2019[16]) • DeepBench: Benchmarking Deep Learning Operations on Different Hardware (S. Narang. 2016[17]) [16] Gale, T., Elsen, E., and Hooker, S. The state of sparsity in deep neural networks, 2019. [17] S. Narang, “DeepBench: Benchmarking Deep Learning Operations on Different Hardware,” https://github.com/baidu-research/DeepBench, 2016.

CompML Pruningの効果 Blalock et al,. 2020[2]のメタ研究 27 • 精度が少しだけ下がる，下がらない⼿法も多くある． •
⼤きなネットワークで⼤量にPruningを⾏う場合，Random Pruningよりは精度が⾼くなる．⼩さいネットワークのPruningの場合はこの限りではない． • 異なるレイヤーに異なるパラメータを割り当てる⽅が，全層⼀様にPruningするよりも精度は⾼い． • 同じパターンのPruningでスクラッチから学習するよりも、Fine-Tuneの⽅が精度は⾼い． • Pruningするよりも優れたアーキテクチャへの変更の⽅が効果的． • 効率の悪いアーキテクチャの時は，Pruningはより効果的．

CompML おまけ

CompML Sparse Neural NetworkのGPU演算⾼速化の研究 Pruningされたネットワークの演算を⾼速化するためのGPUカーネルに関する研究分野．⾮構造なPruningに関して，学習や推論を⾼速化するためのGPUカーネルはまだ発展途上． 29 • GPU Kernels
for Block-Sparse Weights (Gray et al,. 2017[18] ) • Fast Sparse ConvNets (Elsen et al,. 2019 [19] ) • Sparse GPU Kernels for Deep Learning (Gale et al,. 2020[20] ) • SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference (Wang. 2020[21] ) [18] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3. [19] Elsen, E., Dukhan, M., Gale, T., & Simonyan, K. (2020). Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14629-14638). [20] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3. [21] Wang, Z. (2020, September). SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (pp. 31-42).

CompML Reference

CompML 31 [1] Han, S., Pool, J., Tran, J., and
Dally, W. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pp. 1135–1143, 2015. [2] Blalock, D., Ortiz, J.J.G., Frankle, J., Guttag, J.: What is the state of neural network pruning? arXiv preprint arXiv:2003.03033 (2020) [3] Mirkes, E. M. (2020, July). Artificial Neural Network Pruning to Extract Knowledge. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. [4] Lee, N., Ajanthan, T., and Torr, P. H. S. Snip: singleshot network pruning based on connection sensitivity. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6- 9, 2019. [5] Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.

CompML 32 [6] Liu, Z., Sun, M., Zhou, T., Huang,
G., and Darrell, T. Rethinking the value of network pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. [7] Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. [8] Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016. [9] He, Y., Zhang, X., and Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397, 2017. [10] Kim, W., Kim, S., Park, M., & Jeon, G. (2020). Neuron Merging: Compensating for Pruned Neurons. Advances in Neural Information Processing Systems, 33.

CompML 33 [11] Liu, Z., Sun, M., Zhou, T., Huang,
G., and Darrell, T. Rethinking the value of network pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. [12] Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. [13] Lee, N., Ajanthan, T., and Torr, P. H. S. Snip: singleshot network pruning based on connection sensitivity. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6- 9, 2019. [14] Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744, 2017. [15] Zhu, M. H., & Gupta, S. (2018). To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression.

CompML 34 [16] Gale, T., Elsen, E., and Hooker, S.
The state of sparsity in deep neural networks, 2019. [17] S. Narang, “DeepBench: Benchmarking Deep Learning Operations on Different Hardware,” https://github.com/baidu-research/DeepBench, 2016. [18] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3. [19] Elsen, E., Dukhan, M., Gale, T., & Simonyan, K. (2020). Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14629-14638). [20] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3.

CompML 35 [21] Wang, Z. (2020, September). SparseRT: Accelerating Unstructured
Sparsity on GPUs for Deep Learning Inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (pp. 31-42).

CompML: Introduction to Neural Network Pruning

CompML: Introduction to Neural Network Pruning

ryoherisson

More Decks by ryoherisson

Other Decks in Research

Featured

Transcript