Slide 41
Slide 41 text
[1] https://developers.openai.com/api/docs/guides/optimizing-llm-accuracy
[2] Jason Wei, Xuezhi Wang, Dale Schuurmans, et al. 2022. Chain-of-thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the NeurIPS 2022.
[3] Xuezhi Wang, Jason Wei, Dale Schuurmans, et al. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In Proceedings of the ICLR 2023.
[4] 高野・齋藤・石原『Kaggleではじめる大規模言語モデル入門』講談社(
2026)
[5] https://github.com/huggingface/transformers
[6] https://github.com/huggingface/trl
[7] https://github.com/huggingface/sentence-transformers
[8] Edward J. Hu, Yelong Shen, Phillip Wallis, et al. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the ICLR 2022.
[9] Tim Dettmers, Mike Lewis, Younes Belkada, et al. 2022. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. In Proceedings of the NeurIPS 2022.
[10] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, et al. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. In Proceedings of the NeurIPS 2023.
[11] https://github.com/bitsandbytes-foundation/bitsandbytes
[12] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, et al. 2023. GPTQ: Accurate Quantization for Generative Pre- trained Transformers. In Proceedings of the ICLR 2023.
[13] https://github.com/ModelCloud/GPTQModel
[14] https://github.com/vllm-project/llm-compressor
[15] Ji Lin, Jiaming Tang, Haotian Tang, et al. 2024. AWQ: Activation-aware Weight Quantization for On-device LLM Compression and Acceleration. GetMobile: Mobile Computing and Communications, 28(4):12–17.
引用