Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
73
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
41
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
48
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
75
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
110
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
270
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
69
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
94
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
130
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
110
Other Decks in Technology
See All in Technology
「駆動」って言葉、なんかカッコイイ_Mitz
comucal
PRO
0
130
業務の煩悩を祓うAI活用術108選 / AI 108 Usages
smartbank
9
19k
re:Invent2025 セッションレポ ~Spec-driven development with Kiro~
nrinetcom
PRO
2
160
産業的変化も組織的変化も乗り越えられるチームへの成長 〜チームの変化から見出す明るい未来〜
kakehashi
PRO
0
170
ECS_EKS以外の選択肢_ROSA入門_.pdf
masakiokuda
1
120
2025年の医用画像AI/AI×medical_imaging_in_2025_generated_by_AI
tdys13
0
280
小さく、早く、可能性を多産する。生成AIプロジェクト / prAIrie-dog
visional_engineering_and_design
0
310
AI with TiDD
shiraji
1
330
戰略轉變:從建構 AI 代理人到發展可擴展的技能生態系統
appleboy
0
170
Authlete で実装する MCP OAuth 認可サーバー #CIMD の実装を添えて
watahani
0
360
Keynoteから見るAWSの頭の中
nrinetcom
PRO
1
160
First-Principles-of-Scrum
hiranabe
1
380
Featured
See All Featured
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
0
990
Marketing to machines
jonoalderson
1
4.5k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
51k
Six Lessons from altMBA
skipperchong
29
4.1k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Scaling GitHub
holman
464
140k
Done Done
chrislema
186
16k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
70
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
160
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!