al. 2022. Chain-of-thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the NeurIPS 2022. [3] Xuezhi Wang, Jason Wei, Dale Schuurmans, et al. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In Proceedings of the ICLR 2023. [4] 高野・齋藤・石原『Kaggleではじめる大規模言語モデル入門』講談社( 2026) [5] https://github.com/huggingface/transformers [6] https://github.com/huggingface/trl [7] https://github.com/huggingface/sentence-transformers [8] Edward J. Hu, Yelong Shen, Phillip Wallis, et al. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the ICLR 2022. [9] Tim Dettmers, Mike Lewis, Younes Belkada, et al. 2022. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. In Proceedings of the NeurIPS 2022. [10] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, et al. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. In Proceedings of the NeurIPS 2023. [11] https://github.com/bitsandbytes-foundation/bitsandbytes [12] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, et al. 2023. GPTQ: Accurate Quantization for Generative Pre- trained Transformers. In Proceedings of the ICLR 2023. [13] https://github.com/ModelCloud/GPTQModel [14] https://github.com/vllm-project/llm-compressor [15] Ji Lin, Jiaming Tang, Haotian Tang, et al. 2024. AWQ: Activation-aware Weight Quantization for On-device LLM Compression and Acceleration. GetMobile: Mobile Computing and Communications, 28(4):12–17. 引用