MLモデルの環境問題への影響と改善方法 / The environmental impact of present ML models and how can we improve it_JP

MLモデルの環境問題への影響と改善⽅法 DSOC R&D 李星 2020/07/21 機械学習勉強会

Data Strategy and Operation Center ⾃⼰紹介 2019年10⽉に Sansan に新卒社員として⼊社. DSOCに配属.
様々な機械学習⼿法を効率的に⽤いてデータ分析を⾏う. 現在は推薦システムやNLPタスク関連の業務に従事. 李星 XING LI オンライン名刺

Data Strategy and Operation Center どんどん⼤きくなるNLP系深層学習モデル… [16]

Data Strategy and Operation Center どんどん⼤きくなるCV系深層学習モデル… [4]

Data Strategy and Operation Center 深層学習の訓練時間も増加している [17]

Data Strategy and Operation Center 計算⽅法 p_c: 学習時の全 CPU ソケットからの平均消費電⼒
(単位はワット) p_r: 全ての DRAM (メインメモリ) ソケットからの平均消費電⼒ p_g: 学習時のGPUの平均消費電⼒ g: 学習に⽤いる GPU の数 t: 総学習時間 1.58: 電⼒使⽤効率 (PUE: Power Usage Effectiveness) [13] 0.954: ⽶国での消費電⼒あたりの平均CO2 排出量 [14] [2] [2]

Data Strategy and Operation Center 驚くべき⽐較結果 [2]

Data Strategy and Operation Center ⼀般的な機械学習モデルの CO2 排出量 [2]

Data Strategy and Operation Center エネルギー消費はどこが多い? [20]

Data Strategy and Operation Center 許容範囲内の性能のままで, よりエネルギー効率が良いモデルを作るための⼆つの側⾯.

Data Strategy and Operation Center アルゴリズム 1. Mixed Precision, FP16
& FP32 2. Model Distillation 3. Model Pruning 4. Weight Quantization & Sharing 5. その他

Data Strategy and Operation Center 1層に対する Mixed precision 学習のイテレーション [8]
Mixed Precision: どこから “mixed” が来ている?

Data Strategy and Operation Center Mixed Precision: 性能を維持している [8]

Data Strategy and Operation Center Mixed Precision: 主な Library Support
PyTorch Mixed Precision Tutorial: https://pytorch.org/docs/stable/notes/amp_examples.html TensorFlow Mixed Precision Guide: https://www.tensorflow.org/guide/mixed_precision

Data Strategy and Operation Center Model Distillation: 核となるアイデア [11]

Data Strategy and Operation Center Model Distillation: 例 [9] [9]

Data Strategy and Operation Center Model Distillation: 便利な Distilled Model
の実装 Github: https://github.com/dkozlov/awesome-knowledge-distillation

Data Strategy and Operation Center Model Pruning: 基本となるフレームワークとコンセプト [15] Training
Pruning Fine-tuning

Data Strategy and Operation Center Model Pruning: オリジナルモデルの学習結果 [15]

Data Strategy and Operation Center Model Pruning: 枝刈りのみを適⽤後の結果 [15]

Data Strategy and Operation Center Model Pruning: 枝刈り+ 再学習の結果 [15]

Data Strategy and Operation Center Model Pruning: 枝刈りと再学習の繰り返しを⾏った結果 [15]

Data Strategy and Operation Center Model Pruning: 主な Library Support
PyTorch Pruning Tutorial: https://pytorch.org/tutorials/intermediate/pruning_tutorial.html TensorFlow Pruning Tutorial: https://www.tensorflow.org/model_optimization/guide/pruning

Data Strategy and Operation Center Weight Quantization & Sharing: 核となるアイデア
[5]

Data Strategy and Operation Center [5] Weight Quantization & Sharing:
K-means の初期化 3つのcentroid (クラスタ中⼼) の初期化⼿法重みの分布 ( blue) と, fine-tuning 前の分布 (×green cross) と fine-tuning 後の分布 (•red dot) の対応

Data Strategy and Operation Center Weight Quantization & Sharing: 更なる圧縮トリック
(Huffman Coding) [5]

Data Strategy and Operation Center その他: ほとんどが元のネットワーク構造を再設計する必要がある • 特定のネットワーク構造: •
ShuffleNet, MobileNet, BottleNet, SqueezeNet [6] など • Winograd 変換 • 低ランク近似 • Binary/Ternary Net • …

Data Strategy and Operation Center Hardware メモリアクセスを最⼩にすべし！ここではハードウェアのチップの設計の話をするつもりはありません. しかし、アルゴリズムの最適化を続けるべきか, より良い
x(C/G/T)PU に⼿を出すべきかの判断ができるようになるためにも, ハードウェアとモデルの相性を知っておくことは⼤事です.

Data Strategy and Operation Center Roofline Model: 定義 [18] [18]

Data Strategy and Operation Center Roofline Model: どうやって使うか [18]

Data Strategy and Operation Center より環境に優しいハードウェアを選択: xPU問題 [1]

Data Strategy and Operation Center より環境に優しいハードウェアを選択: 場所問題 [1]

Data Strategy and Operation Center より環境に優しいハードウェアを選択: プラットフォーム問題 [2]

Data Strategy and Operation Center より環境に優しいハードウェアを選択: 同じプラットフォームで異なる場所の場合 [1] Amazon Web
Services

Data Strategy and Operation Center まとめネットワーク構造を変えない場合: • Algorithm -
Mixed Precision(FP16&FP32) • Algorithm - Model Distillation • Algorithm - Model Pruning • Algorithm - Weight Quantization & Sharing • Hardware - エネルギー効率を向上させるために Roofline モデルを利⽤. • Hardware - デバイスと場所とプラットフォームを注意深く決める. モデルを再設計する場合: • 特定のネットワーク構造を利⽤ • Winograd 変換 • 低ランク近似 • Binary/Ternary Net ハードウェアレベルから改善する場合: • 今回の範囲外

Data Strategy and Operation Center 植樹するという対処法もあります 6本の⽊の⽣涯処理量~ = CO2 1トン
[19]

Data Strategy and Operation Center Sansanでも植樹しています !

Data Strategy and Operation Center リファレンス 1. Quantifying the Carbon
Emissions of Machine Learning (https://arxiv.org/pdf/1910.09700.pdf) 2. Energy and Policy Considerations for Deep Learning in NLP (https://arxiv.org/pdf/1906.02243.pdf) 3. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (https://arxiv.org/pdf/1910.01108.pdf) 4. Neural Network Architectures(https://towardsdatascience.com/neural-network-architectures-156e5bad51ba) 5. DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING(https://arxiv.org/pdf/1510.00149.pdf) 6. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size(https://arxiv.org/pdf/1602.07360.pdf) 7. Deep Learning Performance Documentation Nvidia (https://docs.nvidia.com/deeplearning/performance/mixed-precision- training/index.html#mptrain__fig1) 8. MIXED PRECISION TRAINING (https://arxiv.org/pdf/1710.03740.pdf) 9. Distilling the Knowledge in a Neural Network(https://arxiv.org/pdf/1503.02531.pdf) 10. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks(https://arxiv.org/pdf/1903.12136.pdf) 11. Knowledge Distillation: Simplified (https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764) 12. ML CO2 IMPACT: https://mlco2.github.io/impact/#home 13. Rhonda Ascierto. 2018. Uptime Institute Global Data Center Survey. Technical report, Uptime Institute. 14. EPA. 2018. Emissions & Generation Resource Integrated Database (eGRID). Technical report, U.S. Environmental Protection Agency. 15. Learning both Weights and Connections for Efficient Neural Networks(https://papers.nips.cc/paper/5784-learning-both-weights- and-connections-for-efficient-neural-network.pdf) 16. GPT-3: The New Mighty Language Model from OpenAI(https://mc.ai/gpt-3-the-new-mighty-language-model-from-openai-2/) 17. AI and Compute(https://openai.com/blog/ai-and-compute/) 18. Performance Analysis(HPC Course, University of Bristol) 19. Reduce your carbon footprint by Planting a tree(https://co2living.com/reduce-your-carbon-footprint-by-planting-a-tree/) 20. EIE: Efficient Inference Engine on Compressed Deep Neural Network(https://arxiv.org/pdf/1602.01528.pdf)

MLモデルの環境問題への影響と改善方法 / The environmental impact ...

MLモデルの環境問題への影響と改善方法 / The environmental impact of present ML models and how can we improve it_JP

More Decks by Sansan DSOC

Other Decks in Science

Featured

Transcript