of Chain-of-X Paradigms for LLMs • ChatShop: Interactive Information Seeking with Language Agents • Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models • Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought • Graph of Thoughts: Solving Elaborate Problems with Large Language Models メモリ • Memory Sharing for Large Language Model based Agents • A Survey on the Memory Mechanism of Large Language Model based Agents エージェントの評価 • Foundational Challenges in Assuring Alignment and Safety of Large Language Models • GPT in Sheep's Clothing: The Risk of Customized GPTs 計画 • Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Agent Framework • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions • Aligning LLM Agents by Learning Latent Preference from User Edits • AgentKit: Flow Engineering with Graphs, not Coding • The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey • GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications • AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications
with Large Language Model-based Reasoning • Automated Social Science: Language Models as Scientist and Subjects∗ • A Multimodal Automated Interpretability Agent • ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models • AutoCodeRover: Autonomous Program Improvement Multi Agent Systems • NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding • AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration • Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents • Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation • 360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System Computer Controlled Agents • MMInA: Benchmarking Multihop Multimodal Internet Agents • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments • Autonomous Evaluation and Refinement of Digital Agents