Slow Planning with Language Models • Planning with Large Language Models for Conversational Agents ツール利⽤ • BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval • MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation • CIBench: Evaluating Your LLMs with a Code Interpreter Plugin • WORLDAPIS: The World Is Worth How Many APIs? A Thought Experiment • Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks • GTA: A Benchmark for General Tool Agents 役割 • The Oscars of AI Theater: A Survey on Role-Playing with Language Models ⻑いコンテキスト理解 • Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
via Poisoning Memory or Knowledge Bases ⾃⼰修正:Direct-Inverse Prompting: Analyzing LLMsʼ Discriminative Capacity in Self-Improving Generation ナレッジ :Knowledge Mechanisms in Large Language Models: A Survey and Perspective モデル:The Llama 3 Herd of Models メモリ:Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach Agent framework • AutoFlow: Automated Workflow Generation for Large Language Model Agents • Transforming Agency • Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning • Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods • Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
Multimodal Agents From Automating Data Science and Engineering Workflows? • All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era • Revolutionizing Bridge Operation and Maintenance with LLM-based Agents: An Overview of Applications and Insights • PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents • A Review of Large Language Models and Autonomous Agents in Chemistry • AgentInstruct: Toward Generative Teaching with Agentic Flows • MMedAgent: Learning to Use Medical Tools with Multi-modal Agent • MIRAI: Evaluating LLM Agents for Event Forecasting • ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions • InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation • LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies Multi Agent Systems • DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations • Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models • BMW Agents - A Framework For Task Automation Through Multi-Agent Collaboration
a City at Scale • ODYSSEY: Empowering Agents with Open-World Skills • Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models Computer Controlled Agents • ASSISTANTBENCH: Can Web Agents Solve Realistic and Time-Consuming Tasks? • OpenDevin: An Open Platform for AI Software Developers as Generalist Agents • Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems • Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence • Tree Search for Language Model Agents
Systems Across the LLM Era • LLMを中⼼にリスト型推薦と対話型推薦の2つの進化の道筋を⽰し、それらがエージェントで収束すると主張する • LLMを活⽤した推薦エージェントの各レベルにおける知能の特徴 • Lv.1 遵守: ユーザーや開発者が事前に定義した指⽰に従って推薦タスクを完了 • Lv.2 ⾃律性:メモリやさまざまなツールを使⽤して⾃律的に推薦タスクを計画し完了 • Lv.3 ⼈格化:営業担当者や観光ガイドなどの専⾨知識やスキルを備え、個別の推薦サービスを提供する • Lv.4 ⾃⼰進化:時間の経過とともに⾃律的に改善、適応する Agentic AI Systems 7⽉29⽇ 更新分
Large Language Models • マルチエージェント環境における他のエージェントの⾏動や戦略を推測・適応する Hypothetical Mindsを提案 • ⼼の理論モジュールは、他のエージェントの戦略や⽬標について仮説を⽣成、評価、精緻化する • その結果をもとに⾃⾝の計画を⽴て⾏動を選択する • 従来のLLMエージェントおよびRLベースラインと⽐較して優れた性能を発揮 Multi Agent Systems 7⽉15⽇ 更新分