Slide 4
Slide 4 text
論文 9月分
プロフィール
・PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
推論
• LOGICGAME: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
• To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
• Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning
• Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
• MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
自己修正
• CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection
and Correction
• An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation
ツール利用
• Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature
• ToolACE: Winning the Points of LLM Function Calling
メモリ
• Self-evolving Agents with reflective and memory-augmented abilities
• Agent Workflow Memory