Slide 4
Slide 4 text
論文 11月分
プロフィール
• Generative Agent Simulations of 1,000 People
• Multi-expert Prompting Improves Reliability, Safety and Usefulness of Large Language Models
• Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
• MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
• AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
計画
• ACPBench: Reasoning about Action, Change, and Planning
自己修正
• Reflection-Bench: probing AI intelligence with reflection
知覚
• IntentGPT: Few-shot Intent Discovery with Large Language Models
• M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning
Framework
• Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?