Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust Training Language Models to Self-Correct via Reinforcement Learning ICLR 2025 (Oral) 発表者:大井 聖也 東京科学大学 M2 井上研究室 2025/09/01 第17回最先端NLP勉強会 ※ 注釈がない場合、図表は論文からの引用です
Language Models Latently Perform Multi-Hop Reasoning? ACL2024. [2] Huang et al. Large Language Models Cannot Self-Correct Reasoning Yet. ICLR2024. [3] Qu et al. Recursive Introspection: Teaching LLM Agents How to Self-Improve. NeurIPS 2024. [4] Kim et al. Language Models can Solve Computer Tasks. NeurIPS 2023. [5] Havrilla et al. GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements. ICML2024