ニューラルネットワークの量子化手法の紹介

Slide 1

Slide 1 text

AI 2023.09.07 Kamesawa Ryosuke GO株式会社ニューラルネットの量子化手法の紹介 1

Slide 23

Slide 23 text

AI 23 ▪ Jacob, Benoit, et al. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. ▪ Zhou, Aojun, et al. "Incremental network quantization: Towards lossless cnns with low-precision weights." arXiv preprint arXiv:1702.03044 (2017). ▪ Lin, Ji, et al. "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." arXiv preprint arXiv:2306.00978 (2023). ▪ Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). ▪ Nagel, Markus, et al. "A white paper on neural network quantization." arXiv preprint arXiv:2106.08295 (2021). ▪ Lin, Xiaofan, Cong Zhao, and Wei Pan. "Towards accurate binary convolutional neural network." Advances in neural information processing systems 30 (2017). ▪ Gholami, Amir, et al. "A survey of quantization methods for efficient neural network inference." Low-Power Computer Vision. Chapman and Hall/CRC, 2022. 291-326. ▪ Nagel, Markus, et al. "Data-free quantization through weight equalization and bias correction." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. ▪ Nagel, Markus, et al. "Up or down? adaptive rounding for post-training quantization." International Conference on Machine Learning. PMLR, 2020. ▪ Dettmers, Tim, et al. "Llm. int8 (): 8-bit matrix multiplication for transformers at scale." arXiv preprint arXiv:2208.07339 (2022). ▪ Wei, Xiuying, et al. "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling." arXiv preprint arXiv:2304.09145 (2023). ▪ Lin, Ji, et al. "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." arXiv preprint arXiv:2306.00978 (2023). ▪ Xiao, Guangxuan, et al. "Smoothquant: Accurate and efficient post-training quantization for large language models." International Conference on Machine Learning. PMLR, 2023. 参考文献

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text