Agents • FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents • Ask-before-Plan: Proactive Language Agents for Real-World Planning • CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration • SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals • NATURAL PLAN: Benchmarking LLMs on Natural Language Planning • Graph-enhanced Large Language Models in Asynchronous Plan Reasoning • A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models • Meta-Task Planning for Language Agents ⻑いコンテキスト理解 • Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA • LLM In-Context Recall is Prompt Dependent • Needle In A Multimodal Haystack • Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models • BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack • DrVideo: Document Retrieval Based Long Video Understanding • Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? • Chain of Agents: Large Language Models Collaborating on Long-Context Tasks • Are Long-LLMs A Necessity For Long-Context Tasks?
Evaluating LLMs on Temporal Reasoning • Faithful Logical Reasoning via Symbolic Chain-of-Thought • Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization • From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step ⾃⼰修正 • When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs • Devilʼs Advocate: Anticipatory Reflection for LLM Agents • Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification プロンプト最適化 • 計画向け:REPROMPT: Planning by Automatic Prompt Engineering for Large Language Models Agents • ツール利⽤向け:AVATAR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval • ⾃⼰修正向け:MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL 学習 • SELF-TUNING: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching • HUSKY: A Unified, Open-Source Language Agent for Multi-Step Reasoning • RE-Adapt: Reverse Engineered Adaptation of Large Language Models
Towards AGI • Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models ツール利⽤ • igCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions • Tool Learning with Large Language Models: A Survey マルチモーダル理解 • CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs • Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis 評価 • The BIGGEN BENCH: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models • A Survey of Useful LLM Evaluation アライメント:Towards Scalable Automated Alignment of LLMs: A Survey キャッシュ:LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching 予測:Can Language Models Serve as Text-Based World Simulators? ⻑期対話:Hello Again! LLM-powered Personalized Agent for Long-term Dialogue ⾃⼰進化:AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments
• The Prompt Report: A Systematic Survey of Prompting Techniques • Open-Endedness is Essential for Artificial Superhuman Intelligence • Position: Foundation Agents as the Paradigm Shift for Decision Making • AGILE: A Novel Framework of LLM Agents • LLMs Meet Multimodal Generation and Editing: A Survey Multi Agent Systems • Autonomous Agents for Collaborative Task under Information Asymmetry • EVOAGENT: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms • MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate • Scaling Large-Language-Model-based Multi-Agent Collaboration • Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey • LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins • LLM-Based Cooperative Agents using Information Relevance and Plan Validation • Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting • A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor
Critique Paper (Meta-)Reviewing • GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning Embodied Agents • A Survey on Vision-Language-Action Models for Embodied AI Computer Controlled Agents • CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only • Large Language Models Can Self-Improve At Web Agent Tasks
• 計画作成(Plan Making): 収集した情報に基づいて詳細な計画を作成する A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models いきなり詳細化せず、⼤枠から詳細化 するのは良い筋かも Agent Capabilities:計画 6⽉3⽇ 更新分
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Grokkingスタート OODでも急激に精度向上 組成タスクではOODは 精度が上がらず⼀般化しなかった Agent Capabilities:推論 6⽉3⽇ 更新分
Process Simulation Parametrization in Digital Twins • デジタルツインのシミュレーションパラメータを⾃動決定するLLMマルチエージェントシステムの設計 • 観察、推論、決定エージェントは、デジタルツインからリアルタイムデータを収集し、重要な観察データを識 別、データを解析し、パラメータを⽣成 • 専⾨知識が少ないユーザーでも効果的にデジタルツインシステムを操作できるようになり、システムのアクセ シビリティと効率が向上 MAS経由でシミュレーションを実⾏ Multi Agent Systems 6⽉3⽇ 更新分
Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting 模擬⾯接の受け答えのクオリティが重要 6⽉3⽇ 更新分 Multi Agent Systems
LLMs as Reviewers:⼈間のレビューとLLMが⽣成したレビューの質を⽐較する • LLMは特に論⽂の範囲外の実験や分析を提案する傾向が強く、専⾨知識を要する批評はエラーが少ない • LLMs as Metareviewers:LLMが個々のレビュー内の問題を特定できるかどうかを評価する • 形式的な間違いや⼀般的な誤解を特定するのには効果的、多くのレビュワーの⽋陥を指摘できる • 表⾯的なレビューや、誤った専⾨知識に基づく指摘が多い レビューのエラー分析 Agentic AI Systems 7⽉1⽇ 更新分